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ABSTRACT 


The  asymptotic  behaviour  of  parameter  estimates  and  the  identification 
and  modeling  of  dynamical  systems  are  investigated.  Measures  of  the 
relevant  information  in  a given  sequence  of  observations  are  defined 
and  shown  to  possess  useful  properties,  such  as  the  metric  property  on 
the  parameter  set.  The  convergence  of  maximum  likelihood  and  related 
Bayesian  estimates  for  general  observation  sequences  is  investigated. 

The  siutation  where  the  true  parameter  is  not  a member  of  a given  para- 
meter set  is  considered  as  well  as  the  situation  where  the  parameter 
set  includes  the  true  model.  The  finite  parameter  set  case  is  empha- 
sized for  simplicity  in  the  convergence  analysis,  but  the  results  are 
extended  in  general  terms  to  the  infinite  parameter  case . It  is  shown 
that  under  uniqueness  conditions  on  the  output  statistics  of  linear 
dynamical  systems  identification  procedures  converge  to  the  true  model 
if  it  is  a member  of  a given  model  set.  If  the  true  model  is  not  a 
member  of  the  set,  then  the  estimates  converge  to  a model  in  the  set, 
closest  to  the  actual  system  in  the  information  metric  sense.  Sta- 
tionary and  non-stationary  systems  are  considered.  Rates  of  convergence 
in  the  mean  are  obtained,  and  the  separate  contributions  of  the  sto- 
chastic and  the  deterministic  parts  of  the  input  to  the  convergence 
rates  are  shown.  The  analysis  also  suggests  methods  for  approximating 
a high  order  system  by  a low  order  model  and  for  selecting  a repre- 
sentative model  from  a given  model  set,  applicable  to  infinite  and  even 
non-compact  model  sets. 
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CHAPTER  I 


INTRODUCTION 

This  thesis  is  concerned  with  some  fundamental  questions  associated 
with  the  common  problem  of  assigning  a mathematical  model  tc  a physical 
phenomenon,  using  a set  of  observations.  The  situation  is  complicated 
by  the  fact  that  the  relationship  between  the  observations  and  the 
sought  mathematical  model  is  uncertain  and  can  only  be  speci^’ea  in  a 
probabilistic  framework.  For  mathematical  tractability  the  problem  is 
formulated  as  one  of  selecting  via  some  criterion  the  "best1*  model 
from  a specified  set  of  models.  The  formulation  of  the  mathematical 
problem  requires,  then,  the  choice  of  a model  set  on  the  one  hand  and 
the  choice  of  a model  selection  criterion,  on  the  other.  The  first 

choice  presents  an  obvious  tradeoff.  The  more  strictly  the  model  set 

... 

is  specified,  the  more  tractable  is  the  mathematical  solution,  but  the 
less  probable  is  the  case  that  a correct  model  is  included  in  the  speci- 
fied model  set.  As  an  illustration,  consider  the  two  extreme  situations. 
If  the  model  set  consists  of  a single  model,  then  the  selection  is  tri- 
vial, but  the  model  may  not  be  an  adequate  representative  of  the  obser- 
ved phenomenon.  On  the  other  hand,  if  the  model  set  is  the  abstract 
"set"  of  "all  models",  then  it  obviously  contains  the  correct  model, 
but  a mathematical  solution  (or  formulation)  of  the  model  selection 


problem  is  then  not  feasible. 


The  model  set  can  be  naturally  specified  in  terms  of  a parameter 
set,  such  that  to  each  parameter  there  corresponds  a model  and  vice 
versa.  The  terms  model  set  and  parameter  set  will  be  used  interchange- 
ably and  precise  relationships  between  them  are  defined  in  the  thesis. 
The  model  selection  problem  can  then  be  naturally  defined  as  a para- 
meter estimation  problem.  Given  a parameter  set  the  problem  formulation 
requires  the  selection  of  a parameter  estimation  criterion.  The  true 
parameter  cannot,  in  general,  be  assumed  to  belong  to  the  prespecified 
parameter  set,  as  asserted  above.  It  turns  out  that  the  maximum  like- 
lihood estimate,  defined  in  Chapter  2 is  most  adequate  for  this  situa- 
tion. On  the  other  hand,  the  Bayesian  methods  of  maximum  a posteriori 
probability  and  least  squares,  also  defined  in  Chapter  2,  intrinsically 
assume  that  the  true  parameter  is  a member  of  the  model  set. 

One  objective  of  this  thesis  is  to  provide  in  a very  general 
setting  answers  to  the  following  questions:  Under  what  conditions  do 

the  maximum  likelihood  and  the  Bayesian  estimates  converge  to  some  para- 
meter in  the  parameter  set?  What  distinguishes  the  selected  model  from 
the  other  models  in  the  model  set  and  what  is  its  relationship  to  the 
true  model?  For  the  selection  of  an  estimation  procedure  is  it  reason- 
able to  assume  that  the  true  parameter  is  a member  of  the  set  when  it 
is  not?  Is  the  true  model  selected  when  it  is  a member  of  the  model 
set?  A question  that  arises  naturally  in  this  setting  is:  what  is  the 

best  approximation  of  a complex  model  by  a simple  one? 

I particular  problem  of  considerable  practical  significance  is  that 


-3- 


of  dynamic  system  identification.  The  situation  described  above,  and 
the  questions  raised,  naturally  apply  to  the  system  identification  prob- 
lem. In  fact,  this  research  has  been  motivated  by  the  problem  of  iden- 
tifying the  dynamic  equations  of  an  aircraft  during  its  operation 
throughout  the  flight  envelope  for  the  purpose  of  adaptive  control.  We 
analyze  the  asynptotic  behaviour  ui.  system  identification  procedures  in 
the  presence  and  in  the  absence  of  the  true  model  in  a given  model  set. 
The  analysis  also  suggests  a systematic  approach  to  certain  system 
modeling  problems  of  practical  significance. 

A major  part  of  the  analysis  in  this  thesis  will  be  restricted  to 
the  case  where  the  model  set  is  finite.  This  restriction  serves  several 
purposes.  We  chose  to  emphasize  the  statistical  properties  of  the  ob- 
servation sequences  involved  (such  as  their  content  of  information)  and 
to  avoid  considerations  of  topological  conditions  on  the  parameter  set, 
which  are  unavoidabie  if  results  for  e.g.  infinite  compact  parameter  sets 
are  desired'.  This  makes  the  analysis  considerably  simpler,  and  enables 
us  to  consider  very  general  classes  of  observation  sequences.  It  is 
nevertheless  demonstrated  in  Chapter  7 that  the  results  obtained  in 
this  thesis  for  finite  parameter  sets  may  be  extended  to  compact  sets 
by  additional  requirements  on  the  topology  of  the  set,  such  as  uniform 
continuity  of  the  density  functions  involv»d.  Further  r^-^arch  in 


this  direction  is  recommended. 


In  addition  to  the  above  consideration,  the  case  of  finite  para- 
meter sets  has  a considerable  practical  significance  as  a method  of 
approximation.  Identification  techniques  for  finite  sets  of  models 
are  considerably  faster  than  those  for  infinite  sets,  as  the  search 
procedure  for  the  parameter  satisfying  the  estimation  criterion  is 
practically  trivial.  In  fact,  this  thesis  makes  a strong  case  for  the 
finite  model  set,  taking  the  viewpoint  that  the  true  model  is  in  most 
cases  not  included  in  any  prespecified  set  of  models.  Identification 
is  thus  a procedure  of  finding  an  approximate  model  whether  a finite  or 
an  infinite  model  set  is  considered.  The  approximation  is  nevertheless 
"coarser"  when  fewer  models  are  included  in  the  model  set. 

It  should,  however,  be  emphasized  that  a substantial  portion  of  the 
thesis  applies  to  parameter  sets  that  may  be  infinite  and  even  non- 
compact.  This  is  the  case  in  the  derivation  of  distance  measures  on 
the  parameter  set  and  the  consideration  of  system  modeling  problems. 

For  comparison  with  earlier  results  we  note  that  the  convergence  of 
the  parameter  estimates  is  considered  in  this  thesis  in  the  probabilis- 
tic senses  of  convergence  almost  everywhere  (a.e.)  and  convergence  in 
the  mean  square  (m.s.),  which  will  be  defined  in  Chapter  2.  Consistency 
is  traditionally  defined  as  convergence  a.e.  of  the  estimates  to  the 
true  parameter  when  it  is  included  in  the  parameter  set. 

1.1  Historical  Review 


Parameter  estimation  techniques  have  been  studied  ever  since  the 


introduction  of  the  maximum  a posteriori  probability  'MAP)  and  the 
least  squares  (LS)  criteria  by  Gauss  [1809],  and  Laplace  [18201  and 
their  later  studies  by  Edgeworth  [1908].  Fisher  [1922]  proposed  the 
maximum  likelihood  (ML)  estimate,  which  has  since  gained  considerable 
popularity  due  to  its  intuitive  appeal  and  its  asymptotic  properties 
(e.g.  LeCam  [1953]). 

The  consistency  of  ML  estimates  for  sequences  of  independent  and 
identically  distributed  (i.i.d.)  observations  was  proved  by  Cramer  [1946] 
who  assumed  differentiability  to  4*th  order  of  the  probability  density 
functions  involved.  Differentiability  assumptions  were  dispensed  with 
in  proofs  by  Doob  [1934]  and  Wald  [1949] . The  main  tool  in  proving  con- 
sistency for  i.i.d.  observations,  is,  naturally,  the  strong  law  of  large 
numbers.  Roussas  [1965]  proved  the  consistency  of  ML  estimates  for  the 
case  of  ergodic  Markov  observation  sequences,  employing  the  ergodic 
theorem.  The  m.s.  convergence  of  LS  estimates  given  i.i.d.  observations 
was  considered  by  Liporace  [1971],  who  showed,  via  the  tp'  li  U/lieatior. 
rule  for  independent  random  variables,  that  the  mean  squtcde  orrox  -of 
these  estimates  is  exponentially  diminishing.  In  the  case  where 
true  parameter  is  not  included  in  the  parameter  set,  the  estimates  were 
shown  to  converge  to  a parameter  in  the  set,  which  is  avast  similar  to 
the  true  parameter.  The  measure  of  similarity  suggested  by  Liporace  in 
related  to  the  information  measures  introduced  in  this  thesis.  Caines 
[1975a]  proved  and  applied  the  subsrartingale  x'toperty  of  sequences  of 


maximized  likelihood  ratios  on  finite  parameter  sets  to  prove  the  con- 
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sistency  of  ML  estimates  on  such  sets  for  a general  class  of  observation 
sequences,  satisfying  a Jertain  probabilistic  condition.  Baram  and 
Sandell  [1976]  extended  Caines'  results  to  Bayesian  estimates,  which 
were  shown  to  be  conFxr.tent  a.e.  and  in  the  mean  square,  and  showed  that 
Caines’  condition  applies  to  stationary  Gaussian  linear  systems. 

The  identification  of  linear  dynamical  systems  employing  parameter 
estimation  techniques  has  been  studied  intensively  for  over  a decade. 
However,  several  consistency  proofs  that  have  appeared  in  the  early  lit- 
erature have  overlooked  the  fact  that  for  consistent  estimation  on  com- 
pact parameter  sets,  uniform  convergence  of  the  associated  probability 
densities  on  the  parameter  set  is  necessary,  while  pointwise  convergence 
only  provides  consistency  for  finite  parameter  sets.  Correct  consistency 
proofs  have  appeared  in  the  late rat  lire  in  recent  years,  Caines  and 
Rissanen  [1974]  (see  also  Rissanen  and  Caines  [1974])  proved  the  consis- 
tency of  ML  estimates  for  autoregressive  and  moving  average  (ARMA)  ob- 
servation sequences.  Ljung  proved  the  consistency  of  a general  class  of 
stochastic  approximation  techniques  [1974a]  and  the  consistency  of  a 
class  of  prediction  error  techniques  [1974b],  (see  also  Ljung  [1975]) 
Caines  [1975b]  proved  consistency  for  stationary  processes  of  a more 
general  class  of  prediction  error  techniques,  which  includes  the  maxi- 
mum likelihood  technique  for  the  case  of  stationary  Gaussain  observation 
sequences.  The  topological  requirements  specified  by  Caines  [1975b] 
reduce  in  the  finite  parameter  set  case  to  a requirement  that  there 
exist  , 1 to  1 correspondence  between  the  parameter  set  and  the  set  of 
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system's  impulse  responses,  corresponding  to  the  system's  innovations 
representation.  Similar  conditions  were  suggested  by  Tse  and  Weinert 
[1975]  (see  also  Tse  [1976])  and  by  Hawkes  and  Moore  [1976]  (see  also 
Moore  and  Hawkes  [1974]),  who  considered  the  convergence  of  Bayesian 
estimai.es  on  finite  sets  of  stationary  Gaussian  linear  systems.  The 
condition  suggested  by  Baram  and  Sandell  [1976]  is  a uniqueness  condi- 
tion on  the  output  statistics  associated  with  the  different  models  in 
the  model  set.  Other  statistical  conditions  are  motivated  and  derived 
in  this  thesis.  We  shall  comment  on  the  correspondence  between  parame- 
tric and  statistical  conditions  in  Chapter  7 as  we  suggest  further 
study  of  this  subject. 

Information  methods  have  been  suggested  by  many  authors  for  the 
solution  of  the  related  problems  of  hypothesis  testing,  signal  selection 
and  model  indentification.  In  recent  years  Kullback's  information 
measure  (Kullback  [1959])  has  proved  to  be  useful  in  th-s  analysis  of 
parameter  estimation  and  model  identification  techniques.  Akaike 
([1972],  [1974])  has  related  Kullback's  information  with  certain  ver- 
sions of  the  ML  criterion.  Kullback's  information  measure  was  employed 
by  Liporace  [1971],  and,  following  Liporace,  by  Hawkes  and  Moore  [1976] 
in  their  studies  of  parameter  estimates  given  i.i.d.  and  stationary 
Gaussian  observations.  In  this  thesis  we  define  and  enq?loy  information 
measures,  which  prove  to  posses  valuable  properties  lacked  by  Kullback's 
information  measure,  such  as  the  metric  property  on  the  parameter 
space.  Other  information  measures  defined  and  employed  in  the  liters* 


ture,  will  be  mentioned  in  Chapter  3 as  they  are  compared  with  the 
information  measures  defined  in  this  thesis. 

1 . 2 Organization  and  Results 

In  the  first  part  of  the  thesis  (Chapters  2,  3 and  4)  we  consider 
general  classes  of  observation  sequences  and  parameter  sets.  The  re- 
sults are  specialized  to  linear  dynamical  systems  in  Chapters  5 and  6. 
Familiarity  with  advanced  concepts  of  probability  theory  is  only  re- 
quired in  Chapter  2 and  parts  of  Chapter  4.  The  sequence  of  Chapter  3, 
sections  4.1  and  4.4,  Chapter  5 and  section  6.3  provides  a consistent 
discussion  of  the  information  approach  to  system  identification  and 
modeling,  which  is  the  mainstream  of  the  thesis.  The  rest  of  Chapter  4 
is  believed  to  be  of  theoretical  interest  and  also  of  practical  value, 
which  is  demonstrated  in  sections  6.1  and  6.2. 

In  Chapter  2 we  present  the  underlying  probabilistic  set  up  for  the 
thesis  and  recall  definitions  and  results  from  probability  and  estimation 
theory  used  in  the  thesis.  Since  parameter  estimates  may  be  based  on 
the  possibly  incorrect  assumption  that  the  true  parameter  is  a member  of 
a given  parameter  set,  we  define  the  different  probability  spaces  in 
which  the  estimates  are  defined  and  in  which  the  analysis  is  performed. 

In  Chapter  3 we  define  two  measures  of  the  relevant  information  in 

each  observation  favoring  one  parameter  in  the  parameter  set  against 
another . Both  measures  will  prove  useful  in  later  analysis.  The  infor- 
mation measures  are  shown  to  be  metrics,  or  distance  measures  on  the  para- 


meter  set  and  to  provide  a measure  of  closeness  of  each  parameter  in  the 
set  to  the  true  parameter  which  is  not  necessarily  a member  of  the  set. 
The  information  measure  defined  in  this  chapter  are  compared  with  other 
measures  of  information  common  in  statistics  and  information  theory. 

In  Chapter  4 we  investigate  the  convergence  of  maximum  likelihood 
and  Bayesian  parameter  estimates  for  general  classes  of  observation 
sequences.  Consistency  conditions  are  derived  in  terms  of  the  informa- 
tion in  the  observations  and  extended  to  the  case  where  the  true  para- 
meter is  not  a member  of  the  parameter  set.  Rates  of  convergence  in  the 
mean  for  the  ML  and  MAP  procedures  are  also  derived. 

In  Chapter  5 we  analyze  the  identification  and  modeling  of  sta- 
tionary Gaussian  linear  systems.  We  show  that  the  identification  pro- 
cedures under  consideration  converge  under  a certain  uniqueness  condi- 
tion to  the  true  model  if  it  is  included  in  the  model  set.  If  the  true 
model  is  not  a member  of  the  model  set  the  identification  procedures 
converge  to  the  model  in  the  set  whose  output  statistics  are  best 
matched  to  those  of  the  true  model.  The  selected  model  is  also  shown 
to  be  closest  to  the  true  model  in  the  information  metric  "ise.  It  is 
then  shown  that  under  the  uniqueness  condition  likelihood  ratios  and 
a posteric'i’i  probability  ratios  converge  in  the  mean  at  rates  faster 
than  exponential.  The  analysis  also  suggests  solutions  to  ccher  modeling 
problems,  such  as  the  approximation  of  a complex  system  by  a simple 
model  and  an  optical  representation  of  a model  set  by  a single  model. 

In  Chapter  6 we  consider  general  classes  of  time  varying  linear 
systems.  In  particular,  we  interpret  for  such  systems  the  information 
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conditions  derived  in  Chapter  4,  and  obtain  consistency  conditions  in 
terms  of  the  output  statistics  associated  with  the  different  models 
in  the  model  set.  The  convergent t of  the  likelihood  and  the 
a posteriori  probability  ratios  is  investigated  and  the  separate  con- 
tributions of  the  stochastic  and  the  deterministic  parts  of  the  input 
to  the  information  and,  consequently,  to  the  convergence  rates  are 
shown. 

In  Chapter  7 we  suggest  further  research  of  possible  extension  and 
application  of  the  theory.  In  particular,  we  show  how  the  convergence 
results  obtained  in  this  thesis  for  finite  sets  of  parameters  may  be 
extended  to  compact  parameter  sets.  Vie  also  suggest  further  investi- 
gation of  the  problem  of  existence  and  uniqueness  of  a solution  to  the 
estimation,  or  identification  problem.  Then  we  suggest  further  study 
of  the  identifiability  of  dynamic  systems  via  application  of  determin- 
istic input  sequences.  Finally,  we  suggest  applications  of  the  theory 
to  classes  of  problems,  not.  directly  addressed  in  this  thesis,  such  as 
the  identifiability  of  non-linear  systems  and  periodically  varying 
linear  systems. 


CHAPTER  II 

PRELIMINARIES:  PROBABILITY  SPACES,  PARAMETER 

ESTIMATES  AND  STOCHASTIC  CONVERGENCE 

The  purpose  o£  this  chapter  is  to  present  the  underlying  mathemati- 
cal set  np  for  this  thesis  and  to  recall  definitions  and  results  from 
probability  and  estimation  theory  that  will  be  used  in  the  following 
chapters . 

Since  a major  objective  of  this  thesis  is  to  analyze,  using  correct 
assumptions,  parameter  estimates  that  may  be  based  on  incorrect  assump- 
tions, it  is  essential  to  define  at  the  outset  the  different  probabilis- 
tic frameworks  in  which  the  estimates  are  defined  and  in  which  the  ana- 
lysis is  performed.  We  first  introduce  the  correct  framework  in  which 
the  analysis  is  performed.  It  consists  of  an  underlying  probability 
space  and  a separate  parameter  space,  of  which  the  true  parameter  may 
or  may  not  be  a member.  Likelihood  ratios  and  maximum  likelihood  esti- 
mates are  naturally  defined  in  this  framework.  On  the  other  hand, 
Bayesian  parameter  estimates  are  defined  in  a different  framework  where 
the  parameter  space  is  a part  of  the  underlying  sample  space.  Conse- 
quently, the  existence  of  a probability  measure  defined  on  the  parameter 
space  (i.e.  assigning  to  each  set  in  the  parameter  space  the  probability 
that  it  includes  the  true  parameter)  is  postulated.  The  Bayesian  frame- 
work then  inherently  includes  the  assumption  that  the  true  parameter 
is  a member  of  the  given  parameter  space,  and  is  inadequate  for  the  ana- 


of  fhg  general  case  considered  in  this  study.  Thus,  while  the  Bayesian 
set  up  is  assumed  in  the  definition  of  Bayesian  estimates,  the  analysis 
of  these  estimates,  as  well  as  the  maxi  roan  likelihood  estimate,  is  per- 
formed using  the  underlying,  non-Bayesian  framework. 

Headers  unfamiliar  with  the  notion  of  measure  and  probability 
spaces  may  identify  here,  and  in  the  following  chapters,  the  functions 
f(Zn),  f(zn|zn  and  f(sjzn)  with  the  familiar  probability  density, 
conditional  probability  density  and  a posteriori  probability  density 
functions  on  Euclidean  observation  and  parameter  spaces.  Several  sym- 


bols and  terms,  mostly  standard  in  probability  and  estimation  theory, 
are  introduced  in  this  chapter.  For  other  terms  and  symbols,  defined 
throughout  the  thesis  where  they  are  used,  the  reader  is  referred  to 
the  symbol  list. 


2.1  Observations,  Parameters  and  Likelihood  Ratios 


Consider  a measurable  space  (ft,  U)  where  ft  is  some  sample  space  and 


U is  a O-algebra  of  subsets  of  ft.  The  observation  sequence  (z^)  is  a 
stochastic  process  on  a probability  space  (ft,  U,  P#)  with  values  in  a 
measurable  space  (D,  V) , called  the  observation  space.  We  shall  be  in- 
terested in  the  case  (D,  V)  * (k  , bS  where  R^  is  the  £ -dimensional 

£ £ 

Euclidean  space  and  B is  the  a-algebra  of  Borel  sets  in  R . We  call 

P*  the  true  measure  and  * the  true  parameter. 

The  parameter  space  S is  a set  such  that  for  each  s e S there 

exists  a probability  measure  P defined  on  (ft,  U)  . Let  T = (*U  S)  . 
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Obviously,  * e T,  but  * need  not  belong  to  the  set  S. 

For  each  s £ T we  denote  by  E expectation  taken  with  respect  to 

s 

Pg.  We  use  the  notation  a.e.  (almost  everywhere)  to  denote  events  of 

P*  measure  one.  Events  of  P measure  one  will  be  denoted  a.e.  P . 

* s s 

Recall  that  the  conditional  expectation  of  a random  variable  x on 

A 

(ft,  Ui  P)  given  A 6 U is  a (/-measurable  random  variable  denoted  E (x) 
such  that 

E EA(x)  - E(x)  (2.0) 

X 

For  each  s € S we  shall  denote  by  E the  conditional  expectation  given 

s 

A,  taken  with  respect  to  Pg. 

If  V and  V are  measures  defined  on  (ft,  U)  then  U is  said  to  be 

absolutely  continuous  with  respect  to  v if  for  any  set  A e U v(a)  * 0 

(*) 

implies  U(A)  « 0.  P is  said  to  be  singular  with  respect  to  v if  it  is 

not  absolutely  continuous  with  respect  to  V. 

I*et  {U  ) = (U  (Zn) ) be  the  increasing  family  of  O-subalgebras  of 
n n 

U,  generated  by 

zn  = ( z.  , • • « , z ) (2.1) 

x n 

For  each  s 6 T and  for  each  n > 0 let  P denote  the  restriction  of  P 

— s,n  s 

to  U Suppose  that  for  each  n 0 the  measures  Pg  n are  absolutely 
continuous  with  respect  to  some  measure  X defined  on  (ft,  U ) . Then 


This  is  not  a standard  definition.  For  a definition  of  mutually  singular 
measures  see  Rudin  [1966],  p.  121. 
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d P 

E 5 i- 

s,n  d 1 


; s e T 


n 


(2.2) 


Are  the  Radon-Nikodym  derivatives  (or  dene i ties)  between  the  respective 
measures.  The  likelihood  ratio  between  two  parameters  s , t e T is  de- 
fined as 


d P 


s,n  _ s»,n 


t#n  'dP4  f 

t,n  t,n 


(2.3) 


provided  that  P is  absolutely  continuous  with  respect  to  P.  . When 
*»n  trn 

the  time  parameter  n is  included  in  the  argument  we  shall  use  the 
what  shorter  notation 


f#(a(n))  2 f#^n(a(n))  i » e T j a(n)  6 V q 


in  particular 


f (Zn)  = f ( Zn ) » s € T 

s 8#n 


h8  (Zn)  = h*  (zn)  f 8,  t e T 


(2.4) 

(2.5) 


For  any  c € (/  and  bey  such  that  f „(b)  f 0 for  all  s 6 5,  the  con- 

ditional  densities  of  c given  b are 

f (c,  b) 
f (C|b)  = JiS — 
s«n  1 f_  _ (b) 


s,n 


in  particular 


f (z  jz"'1) 
s n‘ 


f (zn) 

s 

f (z”'1) 

S 


i s e T 


(2.6) 
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The  conditional  likelihood  ratios  are  then  defined  as 


. f (z  |zn_1) 

< ' 5 5 S*  t 6 T 


(2.7) 


ft(2jzn-) 


for  any  Z°  such  that  f^lz^lz*1  ^ I*  0 ^or  t € T. 

The  foil  awing  condition  will  be  assumed  throughout  the  thesis 


(c2.1)  For  all  s € S the  probability  measures  P_  are  mutually  abso- 

5 f X* 

lutely  continuous. 


2.2  Bayesian  Probability  Densities 

Consider  a measurable  space  (ft,  Ui , where  ft  is  some  sample  space 
and  U is  a O-algabra  of  subsets  of  ft,  and  a measurable  space  (S,  U*)  , 
where  S is  the  parameter  space  and  if  is  a O-algebra  of  subsets  of  S. 
let  {& , i/b)  be  a measurable  space,  where 

flb  = ft  x 5 

and 

Ub  2 U * if 

are  the  cartesian  products  of  the  respective  sample  spaces  and  o-alge- 

bras.  Let  Pb  be  a measure  on  (ftb,  l'b)  . We  denote  by  Eb  expectation  and 

bA  b 

by  E ; A € V conditional  expectation  given  A,  taken  with  respect  to  F . 

We  call  the  restriction  Pb  of  Pb  to  (5,  l's)  the  a priori  probability 

measure  on  (S,  Vs)  . Suppose  that  Pb  is  absolutely  continuous  v/ith  re- 

o 


' wrong  W) 
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spec  t to  some  measure  v on  IS,  U ) , then  the  density 
h d pb 

o d v 

o 


(2.8) 


is  well  defined.  In  particular,  we  call 


(s)  ; s e S 


(2.9) 


the  a priori  probability  density  on  5?  with  respect  to  the  measure  V . 

b 

Let.  (t  ] be  * stochastic  process  on  the  probability  space  (ft  , 
n 

(7b,  Pb)  with  values  in  a measurable  space  (D,  V) , and  let  (l/5)  H {if  (Z*1)} 

n n 

be  the  increasing  sequence  of  O-subalgmbras  of  U° , generated  by 

Zn  5 z ) . Let  P } n > 1 be  the  restriction  of  P°  to  V and  for 

Inn—  n 

Id 

each  n ^ 1 let  P^  be  absolutely  continuous  with  respect  to  some  measur* 
defined  on  (ft*1,  f/b)  . Then  the  density 


h d p 

f*5  = S- 

» *Vn 


(2.10) 


is  well  defined.  We  shall  be  particularly  interested  in  the  a posteriori 
probahility  density  of  s,  given  Zn 


fh  (slz11)  = (s|zn)  = ; n > 1 

" ^ (Zn) 


(2.11) 


assuming  fjj  (Zn)  i 0. 
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Let  the  parameter  set  5 be  finite,  i.e. 


S * Cs^  j j € St  = (0, . ..  ,p) } 


For  each  j € K let 


V81 5{ 


1 S ■ 8. 

3 


0 sMj 


Then 


v.«2v« 

j«o  j 


(2.12) 


is  a measure  (“the  counting  measure")  on  \[S,  V ') . Let  X be  a measure  on 

(jif  u)  then  vn  * v0*x  f n > 1 is  the  product  measure  on  ((f* , E/b) . 

Suppose  that  Fb  io  absolutely  continuous  with  resjpct  to  vr  (i.e.  the 

entire  measure  Pb  is  concentrated  oil  the  set  flat {s,  ; i 6 K»  for  all 
n 1 

n > 0,  then  we  have 


Zn  lZ  1 - / *5  <«'  *">  dvo 

JS 

“ t < <V  ^ 

i-0 

“ i *o  <«i>  **  «“!«!> 

i-0 

where  we  have  Applied  Bayes  rule 

(V  *“>  * *£  <ai>  ^ c^lv 

Substituting  (2. IS)  fad  (2.16)  into  (2.11)  yield*  for  each  j e k 


Mote  that 
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where  the  right  hand  side  is  defined  by  (2.4)  . 
finite  parameter  set  S 


fh  (s.|zn) 


Thus,  finally,  for  the 


(2.19) 


2.3  Parameter  Estimates  and  Stochastic  Convergence 

A - 

An  estimate  s on  5 is  a U -measurable  mapping  from  0 onto  S. 

" n n 

A 

A maximum  likelihood  (ML)  estimate  on  S is  an  estimate  s ^ 6 S such 

that 

|fg(zn)  ; s e s|  < f^  (zn) 

A maximum  a posteriori  probability  (MAP)  estimate  on  S is  an  esti- 
mate s e S such  that 
n 

| fhCs | Zn)  j s € s|  < f^sjz”) 

Let  S be  linear.  Then  a least-squares  (LS)  estimate  on  S is  an 


20- 


estinate  s B S such  that 
n 


e*  - *)T  (•'-  *)}  > Eb  |(i  - *)T  (s  - *)} 

i n n J — I n / 


for  any  estimate  i"  on  5.  x denotes  x transposed  and  * denotes  the 
n 


true  parameter,  assumed  to  have  the  saae  dimension  as  s. 

Let  the  true  parameter  be  assuaed  to  belong  to  a finite  set 
{8^  ef;  j e k).  then  the  LS  estiaate  on  R"  at  instant  n is  the  con- 
ditional expectation 


Un 

E*  *a  - J s fh (alz0) 


dv 


-t 

j«*0 


•j 


(2.20) 


A stochastic  sequence  (x^)  on  (ft,  U , p)  is  said  to  converge  alaoat 
everywhere  («.e.)  to  a randan  variable  x o«  (ft,  V,  P)  if 


lixn  x ■ x a.e. 

_ n 

ir*» 

A stochastic  sequence  (xr)  on  (ft,  U,  P)  is  said  to  converge  in  the 
mean  (or  in  L1)  to  a random  variable  x cn  (3,  U,  P)  if 

lia  e|x  - x|  * 0 

n-  n 
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A vector-valued  stochastic  sequence  {xr)  on  (8-  U , P)  is  said  to 
converge  in  the  Bean  square  (in  m.s.  or  in  L ) to  a random  vector  x 
on  (8,  U,  P)  if 

lim  e|x  - x|  « 0 
__  n 
jr*“ 

where  jxj  5 (xTx)^. 

A sequence  of  parameter  estimates  (£fl)  is  said  to  be  consistent  a.e. 
or  in  the  mean  square  if  it  converges  a.e.  or  in  the  mean  square  to  the 
true  parameter. 

We  now  present  without  proofs  three  well  known  results  from  the 
probability  theory,  which  are  used  in  this  thesis. 

Theorem  2.1  (Jensen's  inequality,  e.g.  Bauer  [1972] , p.  322) . 

Let  x be  a real  in  triable  random  variable  on  a probability  space 
(8,  U,  P)  with  values  in  and  let  g(x)  be  a convex  integrable  function 
on  K1,  then 


g(Ex)  £ E g(x) 

Theorem  2.2  (Patou's  Leama,  e.g.  Bauer  [1972],  p.  71) 

Let  (xn)  be  an  integrable  stochastic  sequence  on  (8,  U,  P)  such 
that  xR>0  a.e.  for  all  n,  then 

E lim  inf  x < lim  inf  E x 
n - n 
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Theorem  2.3  (Lebesgue's  dominated  convergence  theorem,  e.g.  Chung 
11974],  £>.  42) 

Let  (x  ) be  an  integrable  stochastic  sequence  on  (ft,  U , P) . Then 
n 

if 


lim  x • x a.e. 
n 

n-*» 


where  x is  an  integrable  random  variable  on  (ft,  V,  P)  and  if  there 
exists  some  integrable  random  variable  y on  (ft,  V,  ?)  such  that 


E y 


< 00 


and 


V -y 


a.e.  for  ail  n 


then 


lim  Ex  « E x. 
rr*°°  n 

2.4  Martingales  and  Martingale  Difference  Sequences 

Let  (ft,  U,  P)  be  a probability  space  and  let  ( V r ) be  an  increasing 

family  of  a-subalgebras  in  U.  A (/^-measurable  stochastic  sequence  (x^) 

on  (ft,  U,  P)  is  called  a U -martingale  if  for  each  n 

n 

(a)  E|xJ  < » 

U 

(b)  E n x ■ x . a.e. 

n n— l 
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If  the  equality  in  (b)  is  replaced  by  < then  (xn)  is  called  a I/^-super- 
martingale. 

It  can  be  shown  (Doob  11953],  p.  93)  that  the  likelihood  ratio  se- 

(d  P v 

,*rP.  j.  s e S,  defined  in  section  2.2  are  U^-martingales 
*,n  / 


according  to  the  measure  P#.  Hence 


U , d P /d  P . dP*  . U , d P 
„ n-1  s,n'  s,n-l  _ *,n-_l  „ n-1  s,n 

“a  “TT— ja  vT  1 ” 


* dP.  /dP,  . dPc  . 

*,n  *,n-l  s,n-J 


d P. 


dP,  , dP  . 
*,n-l  s,n-l 

d?  , dP4 
s,n-l  *,n-l 


- 1 


Consequently,  we  have  by  (2.7) , (2.6)  and  (2.2) 

.n-1 


U 


. . l>  f iz  |z  A) 

E.  "_1  h"  (z  | z””1)  - E.  "'1  * ■ 1- 

t.ujz"'1) 


1 for  each  s G S 

(2.21) 


Theorem  2.4  (The  martingale  convergence  theorem,  e.g.  Chung  [1974], 
p.  334,  Bauer  (1972],  pp.  341-343) 

Let  (x  ) be  a U -martingale  on  (ft,  U,  P)  and  let 
n n 


sup  Ex  < 00 
n>0  n 


where 


x - sup  (x  , 0) 
n n 
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Then  (xr)  converges  a.e.  to  a finite  limit. 

» i* 

Let  (Si,  V,  p)  be  a probability  space  and  let  ( U r)  be  an  increasing 

family  of  0-subalgebras  in  U.  A V -measurable  stochastic  sequence  (x  ) 

n n 

on  (Si,  U,  P)  is  called  a (^-martingale  difference  sequence  if  it  is 
integrable  and  if 


U . 

_ n-1 

£ X « 0 
n 


a.e. 


Let  be  a stochastic  sequence  on  (Si,  U,  P)  and  let  < U 1 ) be  a sequence 
of  0-subalgebras  of  V,  generated  by  (y1#...,y  ).  Then,  clearly 

V 1 

<*»  - E \) 

is  a (^-martingale  difference  sequence.  Also  note  that  if  (x^)  is  a 
-martingale  difference  sequence  then 


X : V X 
n / .<  m 

hfI 

is  a martingale.  Indeed 

U . n U . 

E Xn  ‘ Z)E  * 

m**l 
n-1 


m 


U 


n-1 


E-/e“  ** 


m-1 


n-1 

V x «x  , 
/ i m n-1 

m=l 
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2.5  Stationarity  and  Ergodicity 

The  purpose  of  this  section  is  to  provide  definitions  and  con- 
vergence results  for  ergodic  sequences,  which  will  be  used  in  the  the- 
sis. It  is  not  intended  to  provide  an  elaborate  presentation  of  the 
concept  of  ergodicity.  For  a thorough  development  of  ergodic  theory 
the  reader  is  referred  to,  e.g.,  Doob  [1953],  Halmos  [1956]  and 
Chacon  and  Omstein  [1959]. 

Consider  a probability  space  (ft,  U,  P) . A transformation  T from 
ft  to  U is  said  to  be  measure  preserving  if 

P(T_1A)  - P (A) 


for  all  A € U. 

Given  a meas?  .re  preserving  transformation  T,  a (/-measurable  event 
A is  said  to  be  invariant  if 

T_1A  - A 


Let  (x^)  be  a stochastic  sequence  on  (ft,  U , P)  with  values  in 

(k  , B ) , where  R is  the  ^-dimensional  Euclidean  space  and  B is  the 

l l 

<J  algebra  of  Borel  sets  of  R . Let  5^  be  the  O-algebra  of  Borel  sets 
o £ _ 5,  i 

of  R~  where  R = R xR  x...  Then  (x  ) is  said  to  be  stationary  if  for 

w oo  n 


each  k > 1 


p [lxi V «=]•>?  [ivi'  V2'-1  e c] 


for  every  C € 8 , 


-"Tat  * 
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and  where  |R(k)|  denotes  the  determinant  of  R(k) . 


Theorem  2.5  (The  ergodic  theorem,  e.g.  Doob  [1953],  p.  464,  Halmos 
[1956],  p.  22,  Weiner  [1949],  p.  16) 

Let  (x  ) be  an  ergodic  sequence  on  (Q,  U,  P)  and  let  f(x  ) be  a 
i*  n 

(/-measurable  function  such  that  E|f(x  )|  is  finite,  then 


The  following  version  of  the  central  limit  theorem  of  probability 
theory  will  prove  useful  in  later  chapters. 

Theorem  2.6  (Billingsley  [1961]) 

Let  (x^)  be  an  ergodic  stochastic  process  on  (fi,  V , P)  such  that 

2 

E x^  is  finite  and 
V 

E n x = 0 a.e. 
n 

(i.e.  (x^)  is  an  ergodic  martingale  difference  sequence) . Then  the 

n 

V'' 

distribution  of  n / , approaches  the  Gaussian  distribution  with 

k=l 

mean  zero  and  variance  E x^. 
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2.6  Metric  Spaces  and  Stochastic  Metrics 

Consider  a set  £ and  a real-valued  function  e on  S*S  which  satisfied 

(i)  e(s;s)  * 0 for  any  s 6 S 

(ii)  e(s;t)  * e(t;s)  for  any  s,  t € S 

(iii)  e(s:t)  < e(s;r)  + e(r;t)  for  any  s,  t,  r 6 S. 

Then  e is  called  a pseudo  metric  on  S.  If  in  addition  to  (i)  , (ii)  and 
(iii)  e satisfies 

(iv)  e(s;t)  -0  ; s,  t 6 S implies  s*t 

then  e is  called  a metric  on  S.  The  pair  (S,  e)  is  called  a metric 

space . 

Now  consider  a probability  space  (ft,  V , P)  and  an  increasing  family 

(V  ) of  a-subalgebras  of  V.  Let  (e  ) be  a (U  ) -measurable  sequence  of 
n n n 

functions  on  S*S  such  that  each  e satisfies  (i)  - (iii)  above.  Then 

n 

we  shall  call  (e^)  a stochastic  pseudo  metric  sequence  on  S . If 

each  e satisfies  (i)  - (iv)  above,  we  shall  call  (e  ) a stochastic 
n n 

(*) 

metric  sequence  on  S . 


(*) 


These  definitions  do  not  seem  to  have  appeared  in  the  literature 
before. 


CHAPTER  III 


INFORMATION 


In  this  chapter  we  develop  the  notion  of  the  information  in  a 
sequence  of  observations  favoring  one  parameter  in  a given  parameter  set 
against  another.  We  do  not  make  the  assumption,  common  in  the  deriva- 
tion of  other  information  measures  in  information  theory,  that  the  true 
parameter  is  included  in  a known  set,  or,  equivalently,  that  the  true 
measure  belongs  to  a known  set  of  measures.  The  mean  and  the  conditional 
mean  values  of  the  discriminating  information  in  a single  observation 
are  shown  to  possess  properties  that  will  prove  useful  in  the  following 
chapters.  In  particular,  their  absolute  values  are  metrics,  or  dis- 
tance measures,  on  the  parameter  space.  This  provides  a meaningful 
measure  of  the  relative  closeness  of  parameters  to  the  true  parameter. 

The  new  information  measures  are  then  compared  with  other  measures  com- 
mon in  information  theory. 

3.1  The  Information  in  a Single  Observation 

Let  S be  a parameter  space  and  let  T » (*  U S)  , where  * is  the  true 
parameter.  If  for  some  pair  of  pararo^cers  s,  t 6 T 

f <Zn)  > f <Zn) 

5 t 

or,  equivalently, 

log  f (Zn)  > log  f (Zn) 
s t 
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►-“>a t * 
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we  say  that  the  parameter  s is  favored  over  the  parameter  t by  the 

observations  Zn.  Then  log  f (Zn)  may  be  regarded  as  a measure  of  the 

s 

information  in  zn  for  selecting  a parameter  from  the  set  T • The  differ- 


log  f (Zn)  - log  f (Zn)  * log  h®  (Zn) 

S V W 


(3.1) 


is  then  a measure  of  the  information  in  Z for  selecting  between  s and 
t.  If  (3.1)  is  positive  then  s is  favored  and  if  it  is  negative  then  t 
is  favored.  The  difference 


log  h®  (Zn)  - log  h®  (Zn_1)  - log  h®  (zjz0""1) 


(3.2) 


is  then  a measure  of  the  difference  between  the  information  favoring  s 
against  t at  instant  n and  the  information  favoring  s against  t at  in- 
stant n-1.  It  can  then  be  regarded  as  a measure  of  the  information 
favoring  s against  t in  the  observation  z^.  We  define 


In(s?t)  = E*"”1  log  h®  (zjz""1) 


(3.3) 


as  the  conditional  mean  information  in  z favoring  s against  t and 

n 


ln(S,t)  = E*  log  h®  (zjz*'1) 


(3.4) 


as  the  mean  information  in  z favoring  s against  t.  (A  more  general  ] 

n A s n-i  1 

form  of  (3.3)  would  be  I^(s;t)  = E^n  log  hfc  (z^jz  j for  some  sequence  j 

(A  ) such  that  A 6 £/  , However,  for  the  purposes  of  this  thesis  we  , 1 

n n n 


mm 

•i  ■ 
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use  the  information  defined  by  (3.4)0 
3.2  Properties  of  Information 

We  now  show  some  properties  of  the  information  measures  defined 
above  that  will  prove  to  be  useful  in  the  following  chapters. 

Theorem  3.1 

Let  S be  a parameter  space.  Then  for  any  s € S and  for  each  n _>  0 
we  have 


I^ds)  0 a.e. 

and 

Ids)  >0 

with  equality  if  and  only  if  fs(zn|zn_1)  ■ f*(zn|zn_1)  a.e. 

Proof 

!nds>  - -E*"'1  log  hi  (zjz"*1) 

Using  the  inequality 

log  a < a-1  ; log  a = a-1  if  and  only  if  a * 1 (3.5) 

We  get 

I (*.-s)  > 1 - E*n_1  h®  (z  |zn_1)  - 0 a.e.  (3.6) 

where  the  second  equality  follows  from  (2.21)  . To  show  that  equality 
holds  only  if  f (z  |zn  ^)  * f.(z  |zn  a.e.  (sufficiency  is  trivial) 

S II  H 
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suppose  that 

V , . 

- 1 - E*  n”A  h*  ^n|zn"A)  a.e. 

i.e. 

- 102  “I!  <‘J2°'1>  -1}*  0 

By  (2.0}  we  then  have 

flhl  (zjz*'1)  - log  h*  (zjz"'1)  - UdP*  * 0 
(3.7)  and  (3.5)  together  give 

h*  ^zn^n  ^ * 1 *<•*• 

or 

f8(zn|2n”1)  « f#(zn|zn“1)  a.e. 

Hence,  equality  in  (3.6}  holds  if  and  only  if  (3.8)  holds. 

since  I (*is)  >0,  we  have 
n ■— 

In(*?s)  - E*I„ (*;s)  > 0 
n * n — 


with  equality  if  and  only  if  1^1* is)  = 0 a.e.,  which,  as  shown  above, 

occurs  if  and  only  if  f (z  |sR  ^)  * f*(z  |zn  S a.e.  _ 

s n1  * n’  ■ 


Corel  1< 


Suppose  that  r 6 S is  the  true  parameter.  Then  for  any  t 6 S 


1 (s;t)  and  1 (sit)  are  maximized  on  S at  s = r.  This  maximum  is 
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unique  unless  for  some  s e S f (z  I Zn  « f (e  |zn  a.e. 

^ s n1  r n1 

Proof 

By  theorem  3.1  we  have 

I (r;t)  - I (s;t)  = I (r;s)  > 0 a.e. 
n n n — 

and 

I (rjt)  - I (s;t)  = I (r j s)  > 0 
n n n — 

with  equality  if  and  only  if  f (z  |z  ) = f (z  |zn  j a.e.  The  asser- 
^ 2 s n1  r n' 

tion  follows . g 
Theorem  3.2 

The  sequence  (|l  (s;t) |);  s,  t € S is  a sequence  of  pseudo  metrics 
on  S.  It  is  a sequence  of  metrics  on  S if  an  only  if  In(s;t)  *=  0 implies 
s ■>  t.  The  sequence  (|ln(s;t) | ) ; s,  t € S is  a stochastic  sequence  of 
pseudo  metrics  on  S.  It  is  a stochastic  sequence  of  metrics  if  and  only 
if  In(s;t)  = 0 implies  s = t. 

Proof 

To  prove  that  |l  (s»t) | is  a pseudo  metric  on  S for  each  n we  have 
to  show  (see  section  2.6)  that  for  each  n it  satisfies  the  following 
conditions . 

(i)  |ln(s;s) j = 0 for  any  s 6 S 

(ii)  (i  (s,*t)j  = ji  (t; s)  | for  any  s,  t 6 S 

n n 

(iii)  1 1^ ( s y t)  | <_  |l  (sjr)  | + [ 1^ (r ; t)  j for  any  s,  t,  r 6 S. 
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He  have 


Hence 

log  h®  <z  Iz”’1)  - 0 
s n 

Then  also 

E log  h®  (z  !zn_1)  = 0 
s n 

and  (i)  follows.  Also 

I(sjt)  » -I(t;s) 
and  (ii)  follows. 

Condition  (iii)  is  proved  as  follows 
|rn(s;r)i  + |ln(r,t)| 

* lE*.  log  h®  (znl Zn  1) | + Ie*  log  h*  (zn|zn  ! 

* |E*  log  fg(zn!2n  ^ " E*  lo9  VzJZ"  I 

+ |e*  log  fr(znlzn_1)  - E*  log  f t < z„t z”"1) I 
^ Ie*  log  fs(zn|zn-1)  - E*  log  I 

- |e*  log  h®  (znizn_1)|  = |ln(s;t)| 


..  '/.Vi .. . 


If  in  addition  to  (i) , (ii)  and  (iii)  |lQ(s;t)|  satisfies 
(iv)  | Ir(sj t) | - 0 ; s,  t 6 S implies  s ■ t 


than  |l  (s?t) 1 is  a metric  on  S.  The  assertion  follows  for  |l  (sjt)|. 
n n 

The  result  for  |l  (s;t)|  is  obtained  by  showing  that  conditions  (i)-(iv) 

” U -1 

above  hold  a.e.,  replacing  E#  by  EA  n~  and  following  the  same  steps.— 


Theorem  3.3 

For  any  t,  r e S and  for  each  n > 0 the  sequences  (|ln(*it) |)  and 

(|ln(*;t)j)  satisfy  the  properties  (i)  - (iii)  above.  They  satisfy 

(iv)  if  and  only  if  £ (z  Jzn  *)  » f#(z  |zn  S a.e.  implies  t • *. 

t n • n 

Proof 

The  proof  of  properties  (i)-(iii)  is  obtained  precisely  as  in  the 
proof  of  theorem  3.2.  (iv)  is  satisfied  if  and  only  if  ?t 
* f*(z  jz*‘  *)  a.e.  implies  t * * by  theorem  3.1._ 

The  variables  |l  (*jt) J and  | I (*;  t) | ; t e S are  then  distance 
n n 

measures  from  the  true  parameter  * to  points  in  the  parameter  set  5. 

They  can  be  regarded  as  extensions  of  the  metrics  |l  (s;t)|  and  |l  (sjt) | 

n n 

on  S to  the  set  T « (*  U S) . 

Corollary  3.2 

Let  s,  t e 5 be  any  pair  of  parameters  in  the  parameter  space  S. 

Then  s is  closer  to  the  true  parameter  * than  t in  the  metric  |l  (sit)| 

if  and  only  if  I (s;t)  >0  a.e.  and  in  the  metric  1 1 (sst)|  if  and 
n n 
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cnly  if  > 0. 

Proof 

s is  closer  to  the  true  parameter  than  t in  the  metric  | 1^ ( s ; t) | if 
and  only  if 

|ln(*is)|  < Un(*jt)(  a.e. 

But  by  theorem  3.1 

|lo(*is)|  * a.e.  *or  any  s e S 

Hence,  a is  closer  to  the  true  parameter  than  t if  and  only  if 

I (*7s)  < I (*;t)  a.e. 
n n 

or 

I (*;t)  - I (*is)  * I (sit)  > 0 
n n n 

To  show  that  s is  closer  to  the  parameter  than  t in  the  metric 

| I Ca»t)J  an  identical  procedure  can  be  followed  using  I (sit)  instead 
n n 

of  I (sit) . 

n B 

Example  3.1 

Let  x be  a random  variable,  whose  probability  density  is  known  to 
belong  to  the  set  2 

20  2 

f i (x)  ■ e 1 i i * 0,1,2  (3.9) 

/2TT  0.2 
1 


gBsaHuraiKs 


si* 


•vwt*  **K*r 

jl.il  LLk.'kL^^jMM 
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Suppoae  that  i-0  is  the  true  parameter,  i.e.,  that  x is  actually  dis- 
tributed according  to  fQ(x) . The  mean  information  in  a single  observa- 
tion x favoring  one  parameter  against  the  other  is  found  to  be 


l 0 * l / a 1 \ 

:u,0)  - K1-.0)  -fr) 


I(2;0)  - 1(2:0)  « ~ log  + 

* a z 

2 


H-K) 


1 a 2 ° 2 / . . \ 

:(ll2)  « 1(1:2)  « £ log  ( -±-  - — ) 

2 a 2 2 \a  2 o */ 


Note 


that  I(i;  j)  -*•  0 as  -*■ 


Theorem  3.1  is  verified  as  follows 


1(1; 0) 


**  hJ 


+ log  — 


where  we  have  used  the  inequality  1 - a < -log  a. 


Similarly 


I(2;0)  < 0 


To  verify  corollary  3.1  we  check  whether 


I (2; 1)  > I(2?0) 


1 (2; 1)  - 1(2; 0)  - S(0;1)  **  -I(1;0)  > 0 
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Similar  ly 


I(l}2)  >.  I(1;0)  . 

Next,  we  check  the  conditions  under  which  the  parameter  1 is  closer 
to  the  parameter  0 than  the  parameter  2,  in  the  metric  senses  defined  by 
theorem  3.2.  By  corollary  3.2  it  suffices  to  have 


I(l;2)  > 0 


x.e. 


log  — + o 

a 4 0 

1 2 


(*-*) 


> 0. 


(3.10) 


(3.10)  relates  the  relative  closeness  of  the  parameters  1 and  2 to  the 
true  parameter  0 (see  corollary  3.2)  with  the  covariances  associated 
with  the  parameters.  It  is  interesting  to  check,  then,  whether  the 
closeness  of  the  covariances  implies  closeness  of  the  parameters  in 
the  information  metric  sense,  i.e.  whether 


K* " < K2  ■ °0,i  (3-n) 

implies  that  the  parameter  1 is  closer  than  the  parameter  2 to  the  true 
parameter  0,  i.e.  that 


1(1; 2)  > 0. 


In  general,  (3.11)  does  not  imply  (3.10),  which  depends  on  the  numerical 

values  of  aQ,  and  O^.  However,  (3.11)  does  imply  (3.10)  in  two 
rases,  namely: 
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Case  1:  aQ  < Ox  < ©2 

Clearly,  (3.11)  is  satisfied.  Using  the  inequality  log  a < 
for  a ? 1 we  have 

a 2 a 2 


log  1 

a 2 a 2 

2 2 


or 


a * a 

log  >1 — > o 

rr  2 rr  2 


and  since  — — - < 1 we  further  have 

a 2 

i 

2 , 


a 2 a 2 


log  > 

c 

l l 


a 2 a 2 


(■-a 


Hence 


(1;2)  « log  — + a 2 / — > 0 

0 2 0 \o  2 a 2) 


Case  2:  < 0Q 

(3.11)  is  again  satisfied.  By  (3.12)  we  have 


log  — >1 — < 0 

_ 2 2 
0 Z 0 * 

1 2 


> 1 we  further  have 


a - 1 


(3.12) 


(3.13) 


and  since 


2 
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Then 


r(l,2)  - I(l;2)  - \ [-1  + 2 (l  - = 


0.134  > 0 


But  if 


a = 4 
i 


then  we  have 


!<J  2 - a 2|  > |o  2 - cr  2 1 
1 i a 2 Q 

But 

I(2;l)  - I (2;  1)  « - j^l.386  + 2^-  - ljj  » -0.057  < 0. 

Hence  closeness  of  the  covariances  to  the  true  covariance  does  not  imply 
closeness  of  the  parameters  to  the  true  parameter  in  the  metrics  |l(*j*)| 
and  1 1 (•;•)!  in  general,  except  for  cases  1 and  2 above. ^ 

We  shall  use  the  notation 

6n(sjt)  E I In(s;t^  l 

and 

dn(sft)  E lVSft)l 

Then  we  have  sequences  of  metric  spaces 


(S,  6_)  ; (S,  d ) 

n n 

where  S is  the  parameter  set.  Note  that  while  I (s;t)  and  6 (s;t)  are 

n n 

U .-measurable  random  variables,  I (s;t)  and  d (s;t)  are  not  random  var- 
n-1  n n 

iables.  We  shall  see  that  I (s;t)  and  <5  (s;t)  are  useful  for  purposes  of 

n n 
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analysis.  The  metric  d (s;t)  will  prove  particularly  useful  when  it  is 

n 

constant  in  time,  as  will  prove  to  be  the  case  for  ergodic  observation 
sequences.  The  parameter  metric  space  can  then  be  denoted  (S,  d) . 

3.3  Comparison  with  Other  Information  Measures 

Attempts  by  statisticians  and  engineers  to  assign  quantitative 
measures  to  the  intuitive  notion  of  information  have  resulted  over  the 
years  in  many  different  definitions  of  information.  Information  measures 
can,  in  essence,  be  classified  in  two  different  categories.  One  is 
characterised  by  the  Shannon  entropy,  which  has  proved  useful  in  communi- 
cation and  source-coding  theory,  sometimes  termed  information  theory. 

The  other  is  characterized  by  Fisher's  and  Kullback " s information  measures, 
which  have  been  more  popular  in  statistical  circles.  Our  information 
measures  fall  in  the  second  category.  It  seems  that  different  permuta- 
tions of  Fisher's  or  Kullback* s information  measures  result  from  differ- 
ent interpretations  of  a given  set  of  data,  which  in  turn  reflect  the 
intended  application.  Our  version  of  information  seems  to  be  the  most 
general,  since,  unlike  other  definitions,  it  does  not  assume  that  the 
true  parameter  belongs  to  the  parameter  set  under  consideration.  However, 
special  care  must  be  taken  in  evaluating  the  advantages  of  one  definition 
of  information  over  another. 

The  information  measures  defined  in  this  chapter  prove  very  useful 
in  the  analysis  of  the  asymptotic  behavior  of  parameter  estimates.  They 
provide  insight  into  the  convergence  of  the  estimates  in  the  presence  and 
in  the  absence  of  the  true  parameter.  However,  they  can  only  be  computed 


i 
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if  the  true  parameter  is  known.  Nevertheless  their  application  is  not 
limited  to  analysis,  as  will  be  evident  in  Chapter  5 where  we  consider 
several  model  selection  problems.  On  the  other  hand,  several  other  in- 
formation measures  which  are  useful  in  given  applications,  such  as 
signal  detection,  do  not  possess  properties  which  are  useful  for  analy- 
tical purposes,  such  as  the  metric  property.  In  the  rest  of  this  section 
we  briefly  discuss  a few  information  measures,  common  in  the  information 
theoretic  and  the  statistical  literature  and  relate  them  to  the  informa- 
tion measures  defined  in  section  3.1. 


3.3.1  Kullback*s  Information,  the  Divergence,  the  Bhattacharyya 
Distance  and  the  Ambiguity  Function 

Kullback  [1959]  defined  the  mean  information  for  discriminating  in 


favor  of  one  hypothesis  against  another,  H2»  given  an  observation  x as 


Ik(l;2) 


f.  (x) 
fJuF 


dy1 (x) 


where  is  a probability  measure  corresponding  to  H^.  is  the  density 
of  with  respect  to  some  measure  X and  f2  is  the  density  with  respect 
to  X of  Uj*  a Probability  measure  corresponding  to  . The  divergence 
between  and  H2,  first  introduced  by  Jeffreys  [1946]  and  employed  by 
Kullback  [1959]  is  defined  as: 

J(l; 2)  - Ik(l»2)  + Ik(2;l) 


■/' 

■/ 


f.  (x) 

[fjfx)  - f2Cx)J  log~~j-dX(x) 

f,  (x) 


f.(x) 

109  rsr  dui<,! 


(x)  dy2(sc) 
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In  contrast,  I(l;2),  defined  by  (3.3)  would  be  written  as 


Kl>2) 


fx(x) 


dy  (x) 


where  y(x),  the  correct  probability  measure  may  be  different  from  both 
yx (x)  and  y2 (x) . 

The  Bhattacharyya  distance  (Bhattacharyya  [1943] ) between  two  den- 
sities f^(x)  and  f^tx)  of  an  observation  x 

B - - An  f If^x)  f2(x)]^  dX 

where  X is  the  Lebesgue  measure  on  the  space  of  x.  Properties  of  the 
Bhattacharyya  distance  and  the  divergence  were  studied  and  compared  by 
Kailath  [ 1967] , and  they  were  found  to  be  particularly  suitable  for 
signal  detection  in  communication . However,  Kullback's  information, 
the  divergence  and  the  Bhattacharyya  distance  do  not  satisfy  the  triangle 
inequality  and  thus  fail  as  metrics  on  the  parameter  (or  hypothesis) 
space.  In  contrast,  the  metric  property  of  the  information  measures 
introduced  in  section  3.1  follows  from  the  consistent  use  of  the  true 
probability  measure  throughout,  whereas  Kullback's  information,  the  di- 
vergence and  the  Bhattacharyya  distance  are  defined  using  different  mea- 
sures. 

The  ambiguity  in  an  observation  x between  a parameter  s and  the  true 
parameter  * is  defined  as 


Yg  - E*  log  fs<x) 
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The  ambiguity  function  y has  been  found  useful  in  the  analyses  of 

8 

error  in  radar  applications  (Woodward  [1953]).  ln  fact 
I(sjt)  - Yfc  - Ys 

Hence,  the  information  between  two  parameters  as  defined  in  this  thesis 
is  the  difference  between  their  ambiguities. 


3.3.2  Fisher's  Information 

We  shall  now  show  that  the  information  measures  introduced  in  sec- 
tion 3.1  are  related  to  Fisher's  information  measure  (Fisher  U956], 
Savage  [1954]).  We  follow  a similar  comparison  between  Kullback's  and 
Fisher's  information  measures  (Kullback  [1959]).  However,  in  order  to 
relate  measures  of  the  same  quantity,  we  define  Fisher's  information  in 
a single  observation  zr. 

Let  S e be  the  parameter  space.  Suppose  that  for  any  s e S the 
following  regularity  conditions  (Cramer  [1946] , Gurland  [1954],  hold  for 
all  x,  j * 1 , • • • ,Ic 


1) 

3 log  fg (zn| Zn_1) 

< F (Zn)  i 

32  log  fs(*nlzn_1) 

3s1 

3s1  3s^ 

1 

where  the  partial  derivatives  are  assumed  to  exist  and  F^Z11)  ;r*d  F2(Zn) 
are  integrable  random  variables. 


S^Unl2”'1’ 


3s1 


32f(z|zn~1) 
s n 

3s1  3sj 


dP. 


0 


2) 


1 
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He  define  Fisher’s  information  in  a single  observation  at  a para- 


meter point  s as  a matrix  ( s ) , whose  elements  are 


j-r  . At  i K^'\(  x “.‘VrS) 

t / *(Vf(2|z“-1,  3.1  3r>  n 

an  * n (3.9) 

Consider  a point  a e S and  a close  point  s + As  e S.  Using  Taylor's 
expansion  to  second  order  we  have 

, , <P-*  . 3log  f (2  fzn  *) 

109  W'j*"1  - 109  > - £ 45  — tr — 


. 3 2 log  f (2 Jz"'1) 


i-1  j-1 


But 


3log  fs(2nJzn"1) 
3s1 


3s  3s- 


1 


fs«*n|zn“1)  da1 


1 a2V«n<2n"1? 


32log  fs(2nlzn~X) 
3s1  3sj 


1 d»i  f^(z  |zn’1)  dsx 

s n 


3»- 


t 

> 

<* 

t 


The  information  in  zr  favoring  s against  a close  point  s + As  as 


defined  by  (3.4)  is 


/> 


f <zjzn~1> 

1 (s;s  + As)  * / log  — Tn^l"  dP* 

n ' W‘„l2  > 

k z 9 log  iz”  x> 


- - / i>*1 

✓ i«l  3s 


dP„ 
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_ k k p.2,  . . i_n-l, 

I°LVz°j  ..> 


i“l  j«l 

k i / av^'1’ 


3s*  3s^ 


dP, 


V'  i CU s<ZJZ  5 

-z>  / -^fr — <** 

i«l  J 3s 


i-1  j«l  ^ L 3s  3s3 


32f  (z_  Zn_1) 


3f  (z  la*1-1)  3f  (z  |zn-1) 
s n s 


f (2  |zn_1) 

s n 


Bs1 


n'Z  } 1 

dP‘ 

3s3  J 


i-1  j-i  x'3'n 


where  the  1m t equality  is  obtained  by  the  regularity  condition  2}  above. 
Hence*  the  information  in  a single  observation  is  related  to  Fisher's 
information  in  a single  observation  by 


X_(sjs  + As)  «=  ~ AsT  if  (s)  As 
n z n 


Defining  similarly  the  conditional  Fisher  Information  in  a single  abser- 

F 

vation  as  a matrix  I (s)  whose  elements  are 

n 


/ 1 

3f  (z  z""1) v 
s n \ 

( 1 MCzJz"-1).) 

i 

Vf.ujz”-1, 

3s1  / 

\fs(zn|zn"1)  3s1  J) 
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We  get,  using  a similar  procedure 

I (s;s  + As)  * ~ As*1,  IF(s)  As 
n 2 n 

3.3.3  Self  Information  and  Entropy 

To  conqplete  this  discussion,  placing  the  information  measures 
motivated  and  defined  in  this  chapter  in  perspective  with  respect  to 
other  measures  found  in  the  literature,  we  mention  two  other  measures 
which  are  quite  coamon  in  information  theory,  namely,  the  self  infor- 
mation and  the  entropy  (e.g.  Pane  [1961]  and  Gallager  [1968]).  The  defi- 
nition of  these  measures  is  based  on  the  Bayesian  assumption  (see  Chap- 
ter 2) . 

Consider  a parameter  set  S.  The  self  information  in  the  measure- 
ments Zn  about  a parameter  s e S is  defined  as 

3?  (s)  S - log  fNsIz")  (3.14) 

A comparative  measure  of  information  can  then  be  obtained  by  talcing  the 
difference  of  the  self  information  corresponding  to  two  parameters 
s,  t 6 S 

&r|(sit>  5 I®(s)  - J?(t)  - - log 
” " £b(t|zn) 

The  self  information  difference  between  s and  t in  a single  observation 
zn  can  be  obtained,  using  (2.6)  and  (2.19)  as 
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V»J 

Taking  expectation  and  conditional  expectation  of  (3.15)  with  respect  to 
the  true  measure  one  gets 

E*{AlS(s;t)  - Al8  (s»t)l  - -T  (sit)  (3.16) 

* l n n-1  J a 

and 

U . 

E*’1*  A^U.-t)  - AI55  (s;t)>  ■ -I  (s;t)  (3.17) 

* ( n n-1  1 « 

Hence,  the  mean  and  the  conditional  mean  values  of  the  self  information 
difference  in  a single  observation  are  the  negative  values  of  the  infor- 
mation measures  defined  in  section  3.1.  (The  sign  is,  of  course,  of  no 
significance  since  the  self  information  defined  by  (3.14)  is  in  fact 
lack  of  information,  and  would  becosm  positive  information,  in  the  sense 
meant  in  this  chapter,  by  inverting  the  sign.) 

Kofce  that  in  (3.17)  the  expectation  is  taken  with  respect  to  the 
correct  probability  measure  J^,  independently  of  whether  the  correct 
parameter  even  belongs  to  the  set  5.  Zf,  on  the  other  hand  one  makes 
the  assumption  that  the  true  parameter  belongs  to  a finite  set,  say 
{Sjij  ex  = (0,...,p)},  and  takes  a conditional  expectation  given  Zn  of 
(3.14),  then  one  gets 

Un  P 

Eb  « - £ fh  (s  . |zn)  log  fh  (s  . 1 Zn)  = H(Zn)  (3.18) 

j-0  3 3 


Al®  (s;t) 


- Al^  (at  t) 
n-x 


- log 


— . - leg  h*<»,  I*"'1)  (3.15J 


CHAPTER  IV 


CONVERGENCE  OF  MAXIMUM  LIKELIHOOD  AND  BAYESIAN 
ESTIMATES  ON  FINITE  SETS  OF  PARAMETERS 

In  this  chapter  we  study  the  convergence  of  maximum  likelihood  and 
Bayesian  parameter  estimates  for  general  classes  of  observation  se- 
quences. The  convergence  of  the  estimates  follows  from  the  convergence 
of  the  likelihood  ratios  over  the  parameter  set.  Consistency  conditions 
are  derived  in  terms  of  the  information  in  the  observations.  The  case 
where  the  true  parameter  is  not  necessarily  a member  of  the  parameter 
set  is  also  considered.  Rates  of  convergence  in  the  mr.an  for  the  ML  and 
tha  MAP  procedures  are  derived. 

4.1  Convergence  of  Parameter  Estimates 

Let  (z  ) be  a stochastic  process  on  a probability  space  (ft,  U,  P ) 

4*  Jv 

and  let  S * K = {o,...,p}  be  a parameter  set  such  that  {P^;  j e k}  is  a 

family  of  probability  measures  on  (ft,  U) . Let  (U^)  be  an  increasing 

sequence  of  0-subalgebras  of  U generated  by  (zn)  and  let  P . be  the 

3 ,n 

restriction  of  P..  to  for  each  j e K.  Consider  the  following  con- 
dition: 

(c4.1)  For  some  k € K and  for  each  j e K;  j ^ k 

lim  h^  (2°)  * 0 a.e.  (4.1) 

n-*» 
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In  the  sequel  we  show  that  the  convergence  a.e.  of  the  ML  and  the  MAP 
parameter  estimation  procedures  and  the  convergence  a.e.  and  in  m.s.  of 
the  LS  procedure  follow  from  condition  (c4.1) . Of  course,  the  major 

difficulty  in  proving  convergence  of  the  parameter  estimates  is  to 
verify  condition  (c4.1>.  In  the  following  sections  we  give  conditions 
for  general  classes  of  observation  sequences  under  which  condition  (c4.1) 
is  satisfied  when  k is  the  true  parameter  and  extend  the  results  to  the 
case  where  the  true  parameter  is  not  necessarily  a member  of  the  para- 
meter set.  The  latter  case  is  treated  specifically  in  the  following 
chapter  where  the  following  theorems  will  prove  very  useful. 

Theorem  4.1 

Suppose  that  (c4.1)  is  satisfied,  then  ML  estimates  on  K converge 
a.e.  to  k as  n-*®°. 


Proof 


Since  the  set  j e K ; j / k is  finite,  (c4.1)  implies 


lim  sup  / h^  (Zn);  j e K ; j / kl*  0 a.e. 
n-*»  j * ' 


Hence 


lim  sup  | hj|  (Zn)j  j e kU  hj^  (Zn)  - 1 a.e. 
n-»<c  j ' K * K 


or 


lim  k(Zn) 


k 


a.e. . 


n-*» 


lim  f*3 (j  |zn)  * 0 a.e.  for  each  j 6 K ; j t k (4.2a) 

n-»oo 

But  since 

£ - i 

j*o 

we  have 

lim  f13 (k | Zn)  = 1 a. s.  (4.2b) 

n-K0 

yielding  the  assertion.— 

■ 

Theorem  4.3 

Suppose  that  a parameter  vector  s is  assumned  to  belong  to  a finite 


set  Sj  € tr,  j e K in  the  calculation  of  the  estimates  (but  is  not 

necessarily  a member  of  the  set} . Suppose  further  that  for  some  Jc  e K 

m 

condition  (c4.1)  is  satisfied.  Then  LS  estimates  of  s on  R converge 

a.e.  to  s,  . 

k 


By  (2.20)  and  (4.2}  we  have 


lim  s ■ s.  lim  f^(j[zn)  - 

rr+oo  n 3 n-*oe 


sk  a.e. 


Theorem  4.4 


For  the  situation  given  in  theorem  4.3  LS  estimates  converge  to 
in  the  mean-square. 


Proof 


He  follow  in  part  Liporace  11971]  who  treated  the  case  of  independent 
and  identically  distributed  observations.  Consider  the  norm 

Nn  ' *•{<*»  - sk>T<:„  - 

■ E*{Lls 5 ' sk>T£b(sj|zn>  E(si  - V 


P P 


EE'v 


)T  (s.  - sk)  £,|fb(s.|Zn}  fb(s.|zn)J 


3=o  i=o 


^ p2R2  E*  fb(Sjjzn)  for  some  j 6 K ; j / k 
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where 

R2  = max  { (s  . - s/ts.  - sR)  ; j e K ; j / kj 
since  obviously 


> E*|f(s.jzn)  f (Si JZn)| 

(because  f (si  | Zn)  £1). 

By  (2.19)  we  have  for  each  j e K 


<£  <V  V2”> 


fb  (s.) 
a_u3 


f* 


o <V 


K (zI1) 


By  (c4.1)  we  have  for  each  j e K ; j ? k 


lim  f13  (s  . |zn) 
n-*»  1 


f?  (s.) 

< -f— a-  lim  h? 

(sk)  n~> 


(zn) 


0 a.e. 


Hence 


lim  f* (s  |zn)  = 0 a.e. 
n-*»  J 


Now  since 


fh  (s^  jzn)  <_  1 

we  have  by  the  dominated  convergence  theorem  (theorem  2.3) 
that  for  each  j e K ; j f k 

lim  E*f^(s  | zn)  =■  E lim  f^(s.lzn)  = 0 
n-*»  3 n->«>  J 
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and  thus 


lira  N < p2R2  lim  E^fkfs . I Z11-1)  « 0 
n — w l 1 

n-x»  n-*»  J 


yielding  finally 


lim  N “ 0 . 
n 

n-x» 


4.2  Consistency  of  the  Estimates 

In  Chapter  3 we  defined  for  each  pair  k,  j e K 
U 

In(k;j)  = B*n“X  log  h.  (zn|zn“1) 

Let  us  also  define 

J (k;j)  = log  h*  (z  |zn_1)  - I (k.-j) 
n j n n 

j)  is  the  error  in  the  incremental  information  I (k;j) , or  the 
information  residual.  Denote 

VK.j)  5 £ I.tt.l) 

wl 

and 

n 

V (k;j)  = V*  J (k;j) 
n m 

m=l 


Note  that  for  each  j , 


k 6 K (J  <k; j) ) is  a 

*' 


i/  -martingale  difference 
and  consequently  (V  (k;j)) 


sequence  according  to  the  true  measure  P 
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is  a (^-martingale  sequence  according  to  PA. 

Suppose  that  * e K,  i.e.  that  the  true  parameter  is  a member  of  the 
parameter  set  and  consider  the  following  conditions: 

(c4.2)  For  some  k € K and  for  each  j € K ; j f k 

lim  sup  V <k;j)  > -»  a.e. 
n 

(c4.3)  For  some  k € K and  for  each  j e K ; j k 
lim  Y (kjj)  ■ 00  a.e. 

n-K»  n 

Lemma  4.1 

Suppose  that  conditions  <c4.2)  and  (c4.3)  hold  for  k « *.  Then 
for  each  j e K ; j i*  * one  has 

lim  hj  (Zn)  - 0 a.e. 

n-K» 

Proof 

We  have  noted  {see  section  2.4)  that  for  each  j e K the  sequence 
(h^  (Zn) ) is  a U -martingale  according  to  the  measure  P..  Furthermore 

E*  h l (Zn)  - 1 

It  follows  from  the  martingale  convergence  theorem  (theorem  'L. 4)  that 
the  sequence  (h^  (Zn) ) converges  to  a finite  limit.  Thus , the  sequence 
(log  h^  (Zn) ) converges  to  some  a < 00 . We  have 
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log  h*  (Zn)  - Y (*;  j)  + V (*;  j)  (4.3) 

3 n n 

Suppose  that 

lim  log  hjj  (zn)  *=  a > -°°  a.e. 

n-x» 

or 

* n 

lim  log  h.  (Z  ) < 00  a.e. 

n-K»  J 


Then  by  condition  (c4.3)  and  by  (4.3)  we  have 


lim  v (*; j) 
n-"»  n 


a.e. 


contradicting  condition  (c4.2' . Hence,  we  have 


lim  log  h.  (Zn)  * “ a.e. 
ir**  3 


or 


lim  log  hu  (zn)  ■ a.e. 

rr*® 


yielding 


lim  hj|  (Zn)  ® 0 a.e._ 

n-H» 


Theorem  4.5 

Suppose  that  some  k 6 K is  the  true  parameter.  Then  under  condi- 
tions (c4.2)  and  (c4.3)  ML  and  MAP  estimates  are  consistent  a.e.  and  LS 
estimates  are  consistent  a.e.  and  in  the  mean  square. 
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Proof 

The  assertion  follows  directly  from  lemma  4.1  and  from  theorems 
4.1  through  4.4. ^ 

Consider  the  following  condition 

(c4.4)  For  some  k € K and  for  each  j e K ; j ? k there  exists  some 
e.  > 0 and  a subsequence  (n.  ) of  n such  that 

3 Xj 

I (k;  j)  e . a.e.  for  all  n. 

i J ^4 


Theorem  4.6 

Suppose  that  some  k € K is  the  true  parameter.  Then  under  condi- 
tion (c4.2)  and  (c4.4)  ML  and  MAP  estimates  are  consistent  a.e.  and  LS 
estimates  are  consistent  a.e.  and  in  the  mean  square. 

Proof 

By  theorem  3.1  we  have 

I (k; j ) > 0 a.e.  for  all  n > 0 
n — — 

Thus,  condition  (c4.4)  implies  condition  (c4.3)  . The  assertion  follows 
from  theorem  4.5.^ 

In  the  following  chapters  we  shall  see  certain  important  cases  to 
which  the  information  condition  (c4.3)  applies.  We  now  examine  condi- 
tion (c4 .2 ) . We  have  noted  that  for  each  pair  j,  k e K the  sequence 


trn(k;j»is  a martingale  difference  sequence  according  to  the  true  raea 
sure  PA.  The  following  special  case  is  of  particular  interest. 

Lemma  4.2 

For  any  pair  j,  k 6 K let  (J^tkjj))  be  an  ergodic  sequence.  The: 
lim  sup  V (k;j)  - “ a.e. 


Proof 


We  have  by  (2.21)  for  each  w e ft 

Vn(k;jfU))  - -^(k;  j,w)  + V^tkjj,  T w) 

where  T is  a measure  preserving  transformation.  It  follows  that  the 
event 


n-*» 


is  invariant.  Thus,  either 


\ lim  sqp  V (k;j)  < °° > 


P | lim  sup  Vn(k;j)  < °°| 


or 


P | lim  sup  Vn(k;j)  <°o^ 
Obviously,  we  have  that  if 


lim  sup  (k;  j)  < 00 


0 


1 


then 

Vn(k;j) 


1 Lin  sup 


< 00 


But  by  theorem  2.6 


Hence 


yielding 


< Vk,J>  i 

P \lim  sup  < 00  / < 1 

r/n 


P | lim  sup  vn(k;  j)  < 00 1 < 1 


P | lim  sup  vn(k;  j)  < «^  ■ o 


lim  sqp  V^(k;j)  ■ « a.e. 


Example  4.1 


Let  (xn)  be  a sequence  of  independent  identically  distributed  ob- 
servations. Suppose  that  each  xn  is  distributed  according  to  the  den- 


f(V  - 


Let  the  covariance  a2  be  given  on  a set  (o^,  i=l,2},  and  suppose  that 
02  » O^2,  x*e-  1 i-s  the  true  parameter.  As  in  example  3.1  we  have 

for  all  n > 0 


2 2 

i a * a 

x (1;2)  - i log  -s-  + -i- 
n 2 a 2 2 

1 


!±1  /-! 

2 \ a 2 a 2 / 


2 
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Since  (x  ) is  an  ergodic  sequence,  so  is  (J  (1;2).  Thus,  by  lemma  4.2 
n n 

condition  (c4.2)  is  satisfied.  It  follows  from  (3.5)  that  if  a y o 

1 2 

then  In<l;2)  f 0 and  then,  by  theorem  3.1  we  have 

I (1;2)  - 1(1,2)  > 0 for  all  n > 0 
n — 

Thus , condition  (c4.3)  is  satisfied  for  k * 1.  Hence,  by  theorems  4.1 
through  4.4,  the  ML  and  the  MAP  estimates  of  o will  converge  a.e.  and 
the  LS  estimates  will  converge  a.e.  and  in  the  mean-square  to  cr^ 

The  following  general  result  provides  a sufficient  condition  satis- 
fying condition  (c4.2).  Although  it  will  not  be  used  directly  in  the 
following  chapters,  it  seems  to  have  useful  implications  (see  example 
4.2)  . 


Lemma  4.3 

Suppose  that  for  any  j , k e K we  have  for  any  positive  scalar  a 


where 


E*  I Vn  - a i < ~ 

' a " 


nfl  = inf  | n : \Mk;j)  > a | 


(4.4) 


(4.5) 


"f*rr 
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then  for  each  u e Si  either 

lim  V (k;  j ,to)  exists  and  is  finite 

„„  n 

n-x» 

or 

lim  sup  Vn(k;j,to)  - «° 

Proof 

Let 

R (k;j)  = V (k; j)  - a 
a na 

and 

va  (kjj)  = V(kjj) 

min(n,n  ) 
d 

Note  that 

vj  (ksj)  < a + Ra(k;j) 

Since  Y^tkjj)  is  a i/^-martingale,  so  is  (Ya  (k,-j)).  Obviously,  we 

have 

E*{Vn  ^;j)  j f.  a + EJEMkjj) 

Hence,  under  (4.4) 

E*{Vn  (k5j)} 

It  follows  from  theorem  2.4  that  the  sequence  (v*  (k;j))  converges  to  a 


finite  limit.  Let 


A = { to  e ft  : sup  V (k»  j »t0)  < a 
a i n 


and 


A = U A 
a-1  a 


If  to  e A then  to  e A for  some  a,  say,  a . Then, 

a o 

a 

Vn(k;j, to)  = Vn°(k;j,to)  for  all  n 


and  then 

a 

lim  V (k;j,to)  « lim  Vn°(k;j,to)  is  finite. 
n-*°°  n-*» 

If  to  0 A,  then 


lim  sup  V (k;j,to)  * 00 . 


Example  4.2 

Let  (x  ) be  a sequence  of  real  valued  random  variables  taking 
n 

values  in  the  interval  [0,  3].  Suppose  that  the  sequence  (x^)  is  not 
necassarily  independent  or  identically  distributed.  Consider  two  hypo- 
theses (or  two  parameters)  1 and  2,  according  to  which  (xn)  is  i.i.d. 
with  probability  densities 


W 


0 < x < 1 

— n — 

1 < x <3 

— n — 

elsewhere 


0 < X <2 
— n — 


W 


It  is  easy  to  see  that 


2 < x <3 
— n — 


elsewhere 


( 1 ; 2 ) £ 2 log  2 for  all  n 


independently  of  the  actual  values  the  sequence  (x^)  might  take  in  the 


interval  [0,  3],  Now  since  for  any  a > 0 


V (1; 2)  - V . + J <1;2)  < a + J (1;2) 
n n -1  n — n 

a a a a 


we  have 


3*{Vn  (1;2)  " iE*  Jn  (lj2)  i 2 lo9  2 


for  all  n.  Hence  (4.4)  holds.  It  follows  from  lemma  4.2  that  condition 
(c4.2)  is  satisfied  for  this  case  independently  of  the  actual  probability 
measure  generating  the  sequence  (x^) . 

4.3  Convergence  in  the  Absence  of  the  True  Parameter 

Consider  the  probability  and  parameter  spaces  given  in  section  4.1. 

While  the  absolute  continuity  of  the  restrictions  P,  and  P_  of  two 

l,n  2,n 

measures  and  P^  to  the  O-subalgebra  U ^ of  U is  possible  to  verify  in 
practical  situations  (it  follows  e.g.  from  the  absolute  continuity  of 


r — * 
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the  corresponding  conditional  densities  f,  (z  |z  ) and  f.  (z  |z  ) 
3 l,n  n 2,n  n' 

for  each  n) , the  absolute  continuity  of  and  P^  does  not  follow  and 
is,  in  general,  more  difficult  to  verify.  The  following  results  are 
nevertheless  interesting  from  a theoretical  viewpoint. 


Theorem  4.8 

Let  conditions  (c4.2)  and  (c4.3)  hold  for  some  parameter  k e K. 
Furthermore,  suppose  that  the  true  measure  PA  is  absolutely  continuous 
with  respect  to  the  measure  P^.  Then  for  each  j e K ; j y k one  has 

lim  hj  (z")  - 0 

n-*00  * 

and,  consequently,  the  parameter  estimates  will  converge  to  the  para- 
meter k in  the  senses  specified  in  theorems  4.1  through  4.4. 


Proof 


Since  the  sequence  (h£  (Z  ) ) is  a (i/^,  P^)  -martingale  and  since 

Vil  <z”>  - 1 

it  follows  from  theorem  2.4  that  (h^  (Zn) ) convergence  a.e.  to  a 
finite  random  variable.  Since  P*  is  absolutely  continuous  with  respect 
to  P^,  then  (h^  (Zn) ) converges  to  a finite  random  variable  a.e.  P#. 


The  remainder  of  the  proof  is  identical  to  the  proof  of  lemma  4.1,  and 
the  convergence  of  the  estimates  follows  from  the  convergence  of  the 
likelihood  ratios  by  theorems  4.1  through  4.4. _ 
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In  the  following  chapter  */e  shall  treat  the  case  where  the  true 
parameter  is  not  necessarily  a member  of  the  parameter  set  for  a case 
of  practical  interest,  namely,  linear  dynamical  system.  He  shall  not, 
however,  investigate  the  absolute  continuity  of  the  probability  measures 

P*  Pk'  but  rather  vs*  simpler  arguments,  enabled  by  the  particular 
problem  under  consideration. 

Condition  (c2.1)  requires  that  for  any  parameter  k e K,  the  re- 
strictions P^n  of  the  measures  P j , j e K ; j + k be  absolutely  contin- 
uous with  respect  to  the  restriction  P.  of  the  measure  P,  . An  inter- 
esting  observation  is  given  in  the  following  theorem. 

Theorem  4.9 

Suppose  that  condition  (c4.1)  holds  for  the  parameter  k 6 K.  Then 
the  measures  Pj,  j 6 K ; j yt  k are  singular  with  respect  to  the  measure 

re- 


proof 


For  each  j 6 K the  likelihocd  ratio  sequence 


/fk<zn>\ 

\ f . (zn)  / 


is  a martin- 


gale according  to  the  measure  P..  (Doob  [19531,  p.  93).  in  addition,  we 
have 


E, 

3 


« 1 


(4.6) 


t / 


r c 


m \ 
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ay  the  martingale  convergence  theorem  we  then  have 

f <zn> 

lia  • — ■ finite  r.v.  a.e.  P. 
xr**°  f j (Zn) 

(where  r.v.  denotes  random  variable) • 

But  under  condition  (c4.1) 


i it?) 
lie  ■■■-■- 
rr*°“  f .(Z  ) 


a.e.  P. 


So  we  have 


(n-*»  f.(Zn) 


finite  r.v 


I- 


and  also 


fk(zn)  ) 

p^_  / lim  ~ — r-'  * finite  r.v.  } » 0 


'*|s 


f*~  f..  (sn)  J 

Hence,  under  condition  (c4.1)  the  measures  P.;  j e K ; j / k are  singu- 
lar with  respect  to  the  measure 

4.4  Convergence 

The  convergence  of  the  likelihood  and  a posteriori  probability 
ratios  follows  directly  if  condition  (c4.1)  holds.  We  show  that  under 
a certain  condition  on  the  information  in  the  observations  the  conver- 


:  % 

i 


w 


5* 


* 


It  ^ 


gence  rates  are  bounded  by  exponentials  of  the  number  of  samples.  The 
true  parameter  is  not  assumed  to  belong  to  the  parameter  set.  These 
results  provide  performance  measures  for  the  ML  and  MaP  estimation 
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oethods.  In  the  following  chapters  we  shew  that  bounds  of  the  con- 
vergence rates  can  be  computed  in  common  situations  for  linear  sys- 
tems. 


Theorem  4.10 

Suppose  that  condition  (c4.1)  holds  for  some  k S K. 
Then  for  each  j e K ; j / k we  have 


and 


lim  E#hk  (Zn)  - 00 

SJ-HO  3 


U.S  £ss1£l. 

n-*»  * 


Proof 


He  have  by  (c4.1)  for  each  j e K j j / k 


and  by  (4.2) 


lim  hk  (Zn)  » » a.e. 
nr*00  1 


lio^0d£L.„  a.e. 
ir*»  f^tj  |z  ) 


Since  both  sequences  are  non-negative,  we  have  by  Fatou's  lemma 


(theorem  2.2) 
k 

E*‘ 

nr*»  - n-*co 


lim  E.hk  (z")  > lim  inf  E*hk  (zn)  > E.  lim  inf  hk  (Zn)  = » 

nr4co  ^ ^ 


n-*» 


and 


lim  E*  ^-ktZ  } > lim  inf  E*  * > E^  lim  inf  ^(k-lZ  ] 

n-Ko  ( j | Zn)  n-*-1  f^ljjz11)  n-**  fNjlz") 


P !, 
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Now  consider  the  following  condition 


(c4.5)  There  exists  a parameter  k e K such  that  for  each  j e K t j ft  k 
there  exists  a positive  scalar  and  a positive  integer  N.,  such  that 


I (k; j ) > a.  for  all  n > N. 

n - 3 - 1 


(4.7) 


Theorem  4.11 


Under  condition  (c4.5)  there  exists  some  positive  integer  N such 
that  for  each  j 6 K j j ? k the  sequence  (h*  (Zn))  and 

/ fk (k| Za) \ 

( ~r — 1 — l diverge  in  L.  at  rates  no  slower  than  exponential  for  all 

Vf^jlzV  1 


n > N. 


Proof 


ln(5c;j)  ■ E*  log 


£ (an)  £ (a"'1) 

E.i°g  -s— r-  - s»K> g ■ 


Ej  (z“> 


Ej  (a1”1) 


*< 


[J  1 


(< 

$ 


* 


By  (c4.5)  we  then  have 


f (Zn)  f (Zn~X) 

E*  log  — - E^  log  -j—,-  > 


f . (zn) 

1 


f.cz”"1)  3 


yielding 


f (zn) 

E log  — • > a.  + (n  - N.)a.  for  ell  n > N. 

f j (z  ) ” 3 33  " 3 


(4.8) 


a.  a 


i. 
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where 


N -1 

Vs.  i* 3 1 

a.  = E. log  3 


S^T  - S i°j 


VS Jz  3 1 5 


(4.9) 


Since  log  (•)  is  a concave  function,  we  have  by  Jensen's  inequality 
(theorem  2.1) 


vz”> 


fv(zn) 

log  E — > E log  — 

f j (Zn)  * f . (Z°) 


(4.8),  (4.9)  and  (4.10)  give 


(4.10) 


k ,_n.  _ fk(Zn>  . ai 


E*h  (Zn)  - E - > e 3 


f . (Z“) 


(n-N,+l)a 
> e 3 i 


(4.11) 


for  all  n N^,for  each  j € K ; j fi  k 


Hence  for  each  j 6 K ; j ? k the  likelihood  ratio  h*  (zn)  tends  in  the 
mean  to  infinity  faster  then  an  exponential  with  a rate  of  a.. 


By  (2.19)  we  have  for  each  j e K 


1 


(k]  Zn)  = fNk) 

^(jlz11)  f^j)  f.(zn) 

Thus,  by  (4.11)  for  each  j e K ,•  j / k 

fhjklz”)  f^k)  fk(zlli 
' £*>», 
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fV)  '"-y1^ 

f-d. 


for  all  n > N. 

- 3 


(4.12) 


Hence,  for  each  j e K ; j / k the  a posteriori  probability  ratio 


fh  (k|  zn) 
fi><  j | z11) 


tends  in  the  mean  to  infinity  faster  than  an  exponential  with 


a rate  of  a^. 


Finally,  taking  N = max  {n^;  j e K ; j f k}  and  a = min  {a^j  j e K ; 


j ^ k}  we  have  that  the  sequences  (h*  (Zn) ) and  / 

3 VAjlzV 


converge  in 


to  infinity  faster  than  an  exponential  with  a rate  of  a for  all 


n > N. 


At  instant  n the  ML  estimation  method  will  select  the  parameter  k 


if 


fk<sn> 


f . (Zn) 


> 1 for  all  j e K ; j ^ k 


(4.13) 


The  MAP  method  will  select  k if 


■* 

<» 


* 


£L*kni  > ! 

^(j  |zn) 


for  all  j 6 K ; j k 


(4.14) 


Hence,  the  Lx  convergence  bounds  established  in  theorem  4.11  provide  a 
qualitative  measure  of  performance  for  the  ML  and  the  MAP  estimates  in 
terms  of  rates  at  which  (4.13)  and  (4.14)  are  attained  in  the  mean. 

Of  course,  the  bounds  cannot  be  computed  unless  the  true  measure  i 
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known.  Yet,  if  the  true  parameter  car.  be  assumed  to  belong  to  a finite 
set,  then  bounds  can  be  computed  over  the  set.  This  will  be  demonstrated 
in  the  following  chapters,  where  we  consider  linear  systems. 


CHAPTER  V 


STATIONARY  LINEAR  SYSTEMS 

In  this  chapter  we  restrict  our  attention  to  linear  systems  driven 
by  white  Gaussian  inputs  having  time-invariant  statistics.  We  make  the 
assumption  that  the  system  has  attained  steady  state,  i.e.  that  all  sig- 
nals of  interest  are  stationary.  We  first  study  the  convergence  of  iden- 
tification procedures.  The  convergence  conditions  are  obtained  in  terms 
of  the  second  order  statistics  associated  with  the  models  in  the  model 
set.  If  the  true  model  is  included  in  the  set,  it  will  be  identified 
under  a verifiable  uniqueness  condition.  If  the  true  model  is  not  in- 
cluded in  the  model  set,  then  the  identification  procedures  converge  to 
a model  in  the  set  which  is  closest  to  the  true  model  in  the  information 
metric  sense,  introduced  in  Chapter  3,  and  in  the  sense  of  the  second- 
order  statistics  associated  with  the  models.  Then  we  treat  the  con- 
vergence of  the  likelihood  ratios  and  the  ratios  of  a posteriori  proba- 
bilities. We  show  that  under  a simple  uniqueness  condition  the  sequences 
of  likelihood  and  a posteriori  probability  ratios  are  bounded  in  1^  by 
simple  exponentials.  If  the  true  system  belongs  to  the  given  model  set, 
then  the  bounds  can  be  easily  computed  using  the  a priori  data.  The  L^ 
convergence  results  provide  performance  measures  for  the  ML  and  the  MAP 
identification  methods.  Finally,  the  analysis  is  extended  to  other 
modeling  problems.  Methods  are  suggested  for  selecting  a reduced  order 
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model  to  represent  a high  order  system  and  for  selecting  a representa- 
tive model  from  a set  from  which  the  true  system,  or  an  appropriate 
model  of  it,  are  known  to  take  their  values. 

The  convergence  of  the  identification  procedures  is  proved  by 
direct  application  of  the  ergodic  theorem.  This  chapter  then  depends 
only  on  the  results  of  Chapter  3 and  section  4.1  and  the  more  advanced 
probabilistic  arguments  used  in  Chapter  4 are  omitted.  (Note  that 
since  we  consider  here  a very  specific  class  of  observation  sequences, 
we  are,  in  fact,  able  to  treat  a more  interesting  class  of  problems 
than  that  considered  in  section  4.2,  as  the  true  parameter  is  not 
assumed  to  belong  to  the  parameter  set.) 


5.1  Models  and  Densities 


Consider  the  system 


x . . « F x + G*w 
n+1  * n * n 


z = B . x + v 
n * n n 


initialized  at  n = n with 

o 


Ex  = 0 E x x 
n n n 

o o o 


(5.1a) 


where  (wn)  and  (v^)  are  uncorrelated  and  mutually  uncorrelated  Gaussian 
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sequences  with 

E w * E v * 0 
n n 


E(Vn?)  ‘ B*  ! ElVnT)  ' E* 

The  model  set  is  a finite  set  of  models  for  (5.1)  denoted  by 
W1  E {(Fj'  Gj'  Hj»  2j#  Rj)  ; j e K = (o,. ..,£>)}• 


(5  lib) 


(5.2) 


Let 


K‘  = (*  U K) 


(As  in  Chapter  4,  the  restriction  to  a finite  set  is  done  for  the 
analysis  of  convergence  and  consistency.  In  section  5.4  we  consider 
other  modeling  problems  and  there  the  model  set  is  allowed  to  be  infin- 
ite. Also  note  that  the  results  of  this  chapter  can  easily  be  extended 
to  the  case  where  the  system  (5.1a)  is  driven  by  an  additional  deter- 
ministic inputs  sequence.) 

Let 


z 

n 


j e k’ 


denote  the  one-step  least  squares  prediction  of  z^,  given  the  past  ob- 
n*“l 

servations  Z , assuming  that  the  j'th  model  is  the  true  one.  For 
each  i,  j 6 K'  let 


Z. 

D,n 


Zj(n, 


no> 


5Ej{ 


(z„ 


a v T / 

- z . ) (z 


- z.  )1 
n ],n  ) 


(5.3) 
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and 


= r* 

j#n  j 


(n. 


no) 


5 E.i(z  - 

1)  n 


a v T . 

2.  J U " 

3 ,n  n 


i,n>} 


(5.4) 


denote  the  prediction  error  covariance  matrices  according  to  the  respec- 
tive measures.  (For  each  j e K',  z.  and  E.  are  computed,  in  ess- 

J 3#n  3 #n 

eince,  by  a Kalman-Bucy  filter  corresponding  to  the  j'th  model.)  Denote 


I . £ lim  I . (n,  n ) (5.5) 

3 n -►-»  3 ° 

o 

provided  that  the  limit  exists. 

We  shall  use  the  following  condition: 

(c5 . 1)  For  each  j e K'  E . exists  and  has  a finite  positive  definite 

3 

value. 

A sufficient  condition  for  (c5.1)  is  that  each  model  corresponding  to 
j 6 K'  is  detectable  and  controllable . For  each  j e K*  is  ob- 
tained by  running  a Riccati  equation,  or  equivalently,  a Kalman-Bucy 
filter  corresponding  to  (F  ^ , H ^ , R^) . 

Also  denote 

r1  = lim  r1  (n,  n ) (5.6) 

3 n 3 ° 

o 

provided  that  the  limit  exists.  is  obtained  by  the  following  pro- 
cedure. First,  assuming  n *>  take  E,  = E.  for  each  j 6 K'  and 

o 3 ,n  3 


for  all  n>n^,  where  is  any  fixed  integer.  Then  the  dynamic  equation 


generating  simultaneously,  according  to  the  measure  P^,  the  state 


and  its*  one  step  prediction  by  the  j'th  Kalman-Bucy  filter  at.  is 

3,n 


i,n+l 


Xj  »n+l 


F.K.H.  F.  (I-K.II.) 
3 3 1 j 3 j 


fc  » 

i,n 


2. 

3 »n 


*i 

0 


where 


K.  ■ Z.  H?  (H.  1.  wF.  + R.) 
3 3 3 3 3 3 j 


-1 


Let 


i _ 

rj = 


F.K.H.  F , (I-K.H .) , 
3 3 1 j 3 jj 


t G*  = 
3 


Q1  = 


V 


Then  the  matrix 


3 »n 


/ 

- 

\ 

\ 

Xi.,n+1 

r - 1 ( 

1 i,n+l  ' xj,n+l  ( 

( 

A 

X.  , , 

3 fn+l^ 

J 

F.K. 
3 3 


n 


G.  0 
l 


0 Yi. 


1 HHHi  -HJ 


is  generated  by  the  Lyapunov  equation 


H,i  . . = Fi  ^ F1  + G*  Q1  G* 

3^+1  3 3 »n  3 3 3 


(5.7) 


79- 


Initialized  at  by  any  initial  value.  We  can  write 


Then  let 


Y1  = V*:  (n,  n. ) 
3,n  3 1 


y1  « lim  W1  (n,  n, ) 


(5.8) 


Finally 


I1  = + R. 

3 13  3 i 


(5.9) 


It  is  well  known  that  the  limit  (5.8)  of  (5,7)  exists  and  is  finite  if 

the  matrix  F^  has  all  its*  eigenvalues  inside  the  unit  circle.  This  is 

the  case  if  for  each  j e K*  F,  has  all  its’  eigenvalues  inside  the  unit 

j 

circle  and  (F.,  H.)  is  observable.  Note,  however,  that  these  Conditions 
3 3 

i 

are  only  sufficient,  not  necessary,  for  to  be  finite,  since  (5.9) 
may  be  finite  even  if  obtained  as  the  limit  value  of  (5.7)  is  not 
finite.  , 


Theorem  5.1 

For  each  j 6 K*  let  the  corresponding  model  be  stable  and  observable 

and  let  n = -".  Then  the  residuals  sequences  (z  - z,  ) ; j e K‘ ; 
o ^ n 3 ,n 

n ^ 0 are  ergodic  according  to  the  true  probability  measure. 


Proof 


We  have  by  (5.5)  E,  = £.  for  all  n > 0.  Since  both  (z  ) and 
3,n  3 — n 


■>»-.'  ' .v 


I 


(z . ) are  linear  operators  on  a zero  mean  Gaussian  sequence  (x  } , the’ 

3 *n  n 

are  zero  mean  Gaussian,  and  so  is  the  sequence  (z  - £ .)  for  each 

n n,  3 

j e K*.  Hence,  (z  - z.  ) is  a zero  mean  stationary  Gaussian  sequence 
n 3 #n 

for  each  j e K‘.  By  proposition  2.1  we  have  that  (zn~  z^  ft)  is  ergodic 
if  (and  only  if) 

“st  ■ 0 


n-1 


k-0 


where  lR(k) J denotes  the  determinant  of  the  matrix 

. .T 

( H1  H1  + R. 
i T>  1 3 j 3 i 

R(k)  - E< (z  - z.  ) (z  - fc.  . r > - ; _ 

l n 3 «n  n+k  a.n+k  f l ± i iT  ± k 

I H V (F.) 

*333  3 


(5.10) 


k-0 


k>0 


(5.11) 


He  have  for  any  k > 0 


.T 

| R(k) j » |h*  hJ 


l*Jl 


1^ 


Since  all  eigenvalues  of  F*  are  inside  the  unit  circle,  then 


F1!  < 1 


Hence 


lim  Vs  jR(k) 
n-KO  Lj 


IhJ  nfl2  l^i2  lirn  v |rj|2k 

J J J J 


k-1 


K af  I2  l<[: 

1 - 


< 00 


k-1 


(5.12) 


Yielding 


lim 

n-K» 


1 

n+1 


V |K(k)l2 


j-0 


0 


The  assertion  follows. ^ 


Note  that  the  stability  and  observability  of  the  models,  assumed  in 

theorem  5.1  are  only  sufficient,  not  necessary.  In  fact,  we  have 

proved  ergodicity  of  the  state  residuals  (x  - x.  ),  which  is  not 

n 3 ,n 

necessary,  to  show  the  ergodicity  of  (z  - z . ) . In  the  sequel  we  shall 

n j ,n 

directly  use  the  following  condition. 

(c5.2)  For  each  j 6 K1  the  residuals  sequence  (z  - z.  ) is  ergodic. 

n j ,n 


5.2  Information,  Convergence  and  Consistency 

Consider  the  system  (5.1)  and  the  model  set  (5.2).  Let  condition 
(c5.1)  hold.  Then  the  conditional  probability  density  of  z n given  the 
past  observations  zn  1,  corresponding  to  each  model  is 


V’nl2"'1’  ‘ [<»>  V,1]-*  {-  | <*„  - VY  “n  ' S.n1}  ’ 


j ex' 


(5.13) 


where  ^ is  the  dimension  of  z . 

n 

In  Chapter  3 we  have  defined  for  each  pair  j,  k e K' 


I (k; j)  = E log 

n 


yJ!iD~1) 
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and 


Vk,j)  5 

We  have  for  each  j e K* 


E*l°g  fj(2n|zn  l)  - - j log  27r  - I log  )I 


and  for  each  pair  j , k e K 1 


“ | tr  I'1 
2 3 


In<k!J,  - I0t,3)  - i 1o9  |£j|  ♦ i tr  r-l  r*  . i lo9  , 


- i tr  I'1  r* 
2 k k 


Let 


Lj  3 109  1^1  + tr  Jj1  rj  i,  j e K 1 


Then  we  have 


V*'31  - \ ‘S  - V 


ji  k e K 


Also,  by  theorem  3.1 


In(*; j)  - 0 f°r  each  j 


e k 


Hence 


dn(*;j) 


for  each  j e K 


Thus,  for  any  j,  k e K 


pnj 
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d(*;j)  - d (*ik)  » I (*;j)  - I (*}k) 

n n 


i 


. » 

j 

ii 


V 

* 


Hence 


- vk,i> 


d(*ij)  > d(*,k) 


if  any  only  if 


Lj 


(5.18) 


Lemma  5.1 

Let  (z^)  be  generated  by  (5.1)  and  let  condition  (c5.1)  hold. 
Then,  undur  condition  (c5.2),  for  any  j,  k e K 


lira  h£  (Zn)  = 0 
n-*oo 


a.e. 


if 


* 


* 

< L. 
3 


and  only  if 


(5.19) 


(5,20) 


(5.21) 


1 


(5.22) 


log  h£  <Zn)  - ^ log  h£  (zjz  A) 
n«l 


We  have 


i . j , I _n-l.  1 , * iJ 

log  h£  (zn|z  )-  -logj^y 


- 7 (Z-  z.  )T  E71  (z  - z.  ) 
2 n j |H  j n ] #n 


+ 7 (z  - z.  )T  E ^ (z  - z ) 
2 n Kfn  k n k,n 


(5.23) 


Since  for  each  j e K the  residuals  (z  - z.  ) are  ergodic , it 

n j,n 

follows  from  the  ergodic  theorem  (theorem  2.5)  that 


lim  ^ V'  log  h^  (zjz-1)  « E*  log  h£  (zjz”11  X)  a.e 
n-*“  t~J 


7»'l 


Xm(j;k) 


1 * * 

* I (Lk  “ Lj> 


(5.24) 


Now  if 


* * 

< L. 
3 


(5.25) 


Then  obviously 

n 

n-+oo 


lira  V*  log  hj  (z  iz*-1) 

«-X»  K 


m»l 


lira  log  hjj  izn) 

n-4oo  K 


a.e. 
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yielding 


lim  hjj  (Zn)  ■ 0 

K*°>  K 


To  prove  that  (5.19)  inspires  (5.21)!  suppose  that  (5.19)  holds,  but 
(5.21)  does  not,  then 


^ >Lj 


and  by  (5.24) 


lim 

n-*00 


n 


E 

xa“l 


iog  h£  Ujz*"1) 


lim  log  h^  (Zn)  * “ 
n-w 


implying 

lim  h£  (Zn)  « • 

which  contradicts  (5.19).  Thus!  (5.19)  implies  (5.21).^ 
Consider  the  following  condition 

(c5.3)  There  exists  some  parameter  k e K such  that 
* * 

< Lj  for  all  j e K ; j / k 


(5.26) 


(5.27) 


(5.28) 


Theorem  5.2 

Consider  the  system  (5.1)  and  the  model  set  and  let  (c5.1)  hold. 
Under  conditions  (c5.2)  and  (c5.3)  the  ML,  the  MAP  and  the  LS  identifica- 
tion methods  will  converge  a.e.  and  the  LS  method  will  also  converge  in 
m.s.  to  the  model  (F^,  G^,  Q^,  R^) . 
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Proof 

By  lemma  5.1  condition  (c4.1)  is  satisfied  for  the  parameter  k. 
The  assertion  than  follows  from  theorems  4.1  through  4.4.  ^ 

Note  that  by  (5.18)  the  identified  model  is  the  closest  to  the  true 
model  in  the  metric  d. 

Corollary  5.1 

The  convergence  specified  in  theorem  5.2  will  be  to  a model  in 
such  that 

|l£  - L*|  « min  { |l*  - L* | ; j € k|  (5.29) 

Proof 

By  theorem  3.1  we  have 


I (*; j)  > 0 for  each  j e K 


Hence 


if 


*1  > 0 


* 

L.  ft  L 
1 


* 

* 


So  the  assertion  follows  from  lemma  5.1  and  theorem  5.2. g 

The  identification  methods  will  then  converge  to  a parameter  in  K,  clos 
est  to  -he  true  parameter  in  the  scalar  L,  which  in  turn  implies  close- 
ness of  the  corresponding  models  in  terms  of  their  output  statistics. 
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Corollary  5.2 

Syppose  that  the  triai  system  belongs  to  the  set  i.e.  let  (F,  G, 
H»  Q»  R)  « (F  , G , H , Q , R } for  some  x 6 K.  Let  conditions  Cc5.1) 

£ 4 4 X i 

and  (c5.2)  hold  and  syppose  that  for  each  j e K ; j ^ r we  have 

Lj  “ Lj  * Lr  " L*  (5*30) 

Then  the  identification  procedures  will  converge  to  the  true  model  in 
the  senses  specified  in  theorem  5.2. 


Proof 


The  result  follows  immediately  from  corollary  5.1. 


* 

To  compute  the  scalars  L . , 

* 


j e K one  must  compute  the  matrices  £ 


and  I\.  while  the  matrix  £^  can  be  computed  by  running  a Riccati  equa- 
tion corresponding  to  the  j'th  model  to  steady-state,  the  matrix  I\ 
cannot  be  computed  unless  the  truft  measure  or,  equivalently,  the  true 
system  is  known.  If  r 6 K is  the  true  parameter,  then 


r 


* 

r 


rr 


£ 

r 


and  consequently 


Lr  « « log  |£j  + l 


(5.31) 


In  the  identification  problem  the  true  parameter  is  unknown.  If  the 
true  parameter  can  be  assumed  to  belong  to  the  parameter  set,  then  (5.30) 
will  have  to  be  checked  for  all  pairs  of  parameters  in  the  set,  namely 
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(c5.4)  For  all  pairs  i,  j € K ? i yi  j 

Lj  y Li  “ log  + 4 (5.32) 

Theorem  5.3 

Let  the  system  (5.1)  belong  to  the  set  M^,  and  let  condi tions 
(c5.1)  and  (c5.2)  hold.  Then  under  conditions  (c5.4)  the  true  model  is 
identifiable  a.e.  by  the  ML  and  the  MAP  estimates  and  identifiable  a.e. 
and  in  ms.  by  the  LS  estimate. 

Proof 

Under  condition  (c5.4)  we  have  (5.30).  The  assertion  then  follows 
directly  from  corollary  5.2.^ 

5.3  Lj  Convergence 

Me  have  shown  in  section  4.4  the  L^  convergence  of  the  likelihood 
ratios  and  the  a posteriori  probability  ratios  under  condition  (c4.1) . 
Furthermore,  it  was  shown  that  under  condition  (c4.5)  bounds  on  the  L^ 
convergence  rates  can  be  established,  thus  providing  a measure  of  per- 
formance of  the  ML  and  the  MAP  estimation  methods.  We  now  show  con- 
vergence and  derive  L^  convergence  bounds  for  the  identification  of 
stationary  iinear  systenv?  treated  in  this  chapter. 

Consider  the  system  (5.1)  and  the  model  set  and  let  condition 
(c5.1)  hold.  We  have  shown  {(5.17))  that  under  condition  (c5.2) 

In(k;j)  * I (k; j)  * | [L^  - L^l  for  all  n 


ft 


i 
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for  each  pair  k,  j e K where  L,  j fi  K are  constants. 


Theorem  5.4 


Consider  the  system  (5.1)  and  the  model  set  given  by  (5.2). 
Under  conditions  (c5.1)  and  (c5.3)  for  each  j € K ; j jt  k the  sequences 

(h?  (Z**))  and  ( converge  in  L1  to  infinity.  Furthermore, 
i Zn)/ 


the  sequences  converge  at  rates  no  slower  them  exponential 


Proof 


By  lemma  5.1  condition  (c5.3)  implies  condition  (c4.i).  The 
convergence  of  both  sequences  follows  from  theorem  4.10. 


Now  let 


a. 

7 


1 * * 
2 **  V 


for  each  j e K ; j ^ k 


then  following  the  proof  of  theorem  4.11,  we  get  by  (4.11)  and  (4.12) 
for  each  j e K ; j k 

(n+l)a. 


and 


E h*  (zn)  > e 


E f33 Qclzn)  > f^k)  e(n+1)Ctj 
fh  (j  |zn)  ~ f^tj) 


(5.33) 


(5.34) 


The  rates  a ^ = I (k;j)  ; j e K ; j ^ k can  only  be  computed , as 
discussed  in  section  5.2,  if  the  true  model  is  known.  If  the  true  model 


‘ « p*np  - \rr>"  "■ 


is  only  known  to  belong  to  the  set  M , then  the  rates  can  be  bounded  as 
follows.  We  have  seen  that  if  k e K is  the  true  parameter,  then  ((5.31)) 

- log  1^1  + i 


Now  since 


1 , * 


°J  * 2 1LJ  * V 


for  each  j € K ; j f k 


where  k is  now  the  true  parameter,  we  have 


a.  - | [L*  - log  |IJ  - £] 


Then 


op  >_  a = min|min  (Lk  - log  |ZjJ  - 1]  ; Lk  - log  |ljJ  -£  - o 


all  j e K ,•  j j*  kj  k e kJ  (5. 


for 


35) 


(5.35)  reads  as  follows'  For  each  k e K suppose  that  k is  the  true 
parameter.  If 


Lk  - log  |2  | - £ > 0 for  all  j e K ; j ? k (5.36) 

3 k 

then  take  the  min  over  j of  (5.36).  Continue  the  procedure  over  all 
k 6 K,  discarding  such  k for  which  (5.36)  does  not  hold  (since  then  k 
cannot  be  the  true  parameter , for  which  (5.36)  always  holds).  Then  take 
the  least  of  all  the  minimum  values  of  (5.36)  found  above.  Note  that 
this  procedure  does  not  identify  the  true  parameter,  but  rather  finds 
a lower  bound  for  op  over  j 6 K. 
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T he  above  discussion  is  siawarized  in  the  following  theorem. 


Theorem  5 < S 

Consider  the  system  (5.1)  and  suppose  that  its'  true  mode-1  belongs 
to  th»  set  M given  by  (5.2).  Then  under  condition  (c5.3)  we  have 

Er  (Z*)  >e'n+13a  (5.37) 

3 ~ 


f*1  (rjz11)  > ^(r) 

^(jlz0)  ~ f^j) 


(nvi)a 

e 


(5.38) 


for  each  j e K ? j j<  r where  r is  the  true  parameter  and  where  a is 
given  by  (5.35) . 

As  discussed  in  section  4.4  the  bounds  (5.37)  and  (5.38)  provide 
performance  measures  for  the  ML  and  the  NAP  estimation  i.vethods . We 
have  shown  that  bounds  on  tie  convergence  rates  of  the  indicated 
ratios  can  actually  be  computed  for  stationary  Gaussian  linear  systems. 


5.4  Model  Selection 

In  practice,  when  a mathematical  model  of  a dynamical  system  is 
required  for  purposes  of  estimation  and  control,  one  often  knows,  to 
certain  approximation,  what  th-  model  should  be.  However,  because  of 
implementation  constraint  one  has  to  select  a different  model.  Such  is 
the  case  when  the  actual  system  is  of  high  order,  but  the  available 
computation  and  storage  capabilities  are  such  that  only  a low  order 
model  can  be  handled.  Another  modeling  problem  arises  when  the  actual 


system's  model  is  known  to  take  its*  values,  which  may  be  time-varying, 
from  a given  set,  but  only  a single  model  can  be  considered.  An  example 
of  practical  significance  is  the  dynamical  model  of  an  aircraft,  whose 
parameters  vary  considerably  over  its'  flight  envelope.  However,  the 
airborne  computation  and  storage  capabilities  are  limited  and  a single 
model  of  the  aircraft  dynamics  must  be  used  throughout  its  operation. 

These  are  not  identification  problems  in  the  strict  sense.  Never- 
theless the  analysis  in  Chapter  3 and  sections  5.1  and  5.2  suggests  a 
natural  extension  of  the  results  into  the  model  selection  problems  in- 
troduced above.  It  should  be  emphasized  that  unlike  the  investigation  of 
convergence  and  consistency  of  parameter  estimate  the  results  of  this 
section  apply  to  infine  and  even  non-compact  parameter  sets. 

5.4.1  The  Selection  of  a adduced  Order  Model 

Suppose  that  the  true  system  or  an  approximate  model  of  it  are 
known,  but  their  dimensions  are  too  high  for  implementation  of  estima- 
tion and  control  prodecures.  A model  of  lover  dimension  is  then  desired, 
bet  the  true  system,  or  an  approximate  model  of  it  be  given  by  (5.1)  and 
let 

**  5 {(F.,  Gg,  Hs,  Qs,  R.)  ; s 6 Sj  (5.39) 

be  a model  set  of  dimension  lower  than  that  of  (5.1).  The  system  co- 
efficients in  M depend  on  a parameter  vector  s belonging  to  a parameter 
set  S.  it  is  desired  to  find  the  model  in  the  set  W which  is  closest 
to  the  true  system  (F#,  G#,  H#,  Q*,  R#)  is  some  meaningful  distance 


sense 


For  each  s e S let 


where 


L = log  | 2 | + tr  E ^ T 
s s s s 

r*  = E#  i (z  - z ) (z  - z )Tl 

s * ( n s,n  n s,n  } 


(5.40) 


z^  n is  the  one-step  least-square  prediction  of  zn  given  the  past  ob- 
servations Sn_1  assuming  that  s is  the  true  parameter  value,  and  Z is 

the  correspinding  prediction  error  covariance  matrix.  E is  obtained 

6 

by  running  a Riccati  equation  corresponding  to  the  model  (Fg#  Gg,  Hfi, 

Q , R ) to  steady-state.  The  confutation  of  1 was  discussed  in  the 
s s ® 

previous  section.  Let  s°  e S be  a parameter  which  satisfies  the  follow- 
ing criterion 


L*  < (l*  t s e S,  s ft  s° } 

Then,  following  the  reasoning  of  section  5.2  the  model  (F  G 9 , H 

5 S S 

2 ei  R »)  satisfies  the  following  equivalent  criteria: 
s s 

1)  The  model  which  is  closer  to  the  true  model  than  any 
other  model  in  M in  the  sense  d(*;s0)^  {d(*;s);  s e S). 

2)  The  model  which  would  be  favored  over  any  other  model 
in  M by  the  incoming  information. 

3)  The  model  which  would  identified  as  the  true  model  among 
any  finite  set  of  models  from  the  set  M by  the  maximum 


1 'm  ■earn 


^ vyrrTT-’ 
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likelihood  and  Bayesian  estimation  techniques. 

The  model  selection  problem  reduces  then  to  the  minimization  prob- 
lem 

min  |Lg  ,*  s € s|  (S.41) 

He  do  not  address  the  algorithmic  problem  of  solving  (5.41)  or  the  exis 

* 

tence  of  a unique  minimum  of  L on  <S.  These  problems  are  suggested  for 
further  research. 

5.4.2  The  Selection  of  a Representative  Model 

Suppose  that  the  model  of  a linear  system  whose  parameters  may  be 
time-varying  is  known  to  take  its  values  from  a set 

K S {(P,,  <St,  Hs,  fis,  sg>  i.es) 

Two  different  cases  may  be  considered. 

1)  The  model  takes  a certain  constant  value  in  the  set  M 
and  there  is  no  prior  knowledge  even  in  a probabil- 
istic sense  on  what  value  it  might  be. 

2)  During  the  system's  operation  its'  mathematical  model 
varies  over  the  model  set  M . However,  it  is  not 
possible  to  consider  the  model's  time  program. 

In  either  case  it  is  desired  to  select  a single  model  from  the  set  A! 
to  represent  the  system  throughout  its'  operation.  One  criterion  for 
the  selection  of  such  a model  is  that  the  maximum  possible  distance  d 
between  the  representing  model  and  the  true  model  (whatever  it  might  be) 
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will  be  minimal. 

The  procedure  for  selecting  the  representative  model  from  M will 
then  be  as  follows.  First,  for  each  parameter  s e S find  the  parameter 
t whose  distance  from  s is  maximal,  and  the  corresponding  maximum  dis- 
tance. Then  find  the  parameter  s for  which  the  maximal  distance  found 
is  the  first  step  is  minimal. 

The  distance  between  a parameter  s and  the  parameters  t of  the  set 
S is  maximized  over  t by  maximizing  with  respect  to  t 


- log  |I,|  + tr  if1  T* 


(5.42) 


where,  as  before 


E = E /<x  - 2 ) (z  - z )Ti 

a a ( r.i  s,n  n s,n  f 

is  obtained  by  running  a Kalman-Bucy  filter  to  steady-state,  and 

S E.  | (z  - z ) (z  - z )T  i 

a t { n s,n  n s,n  f 


is  obtained  by  running  a Lyapunov  equation  to  steady-state,  as  shown  in 
the  previous  section. 

The  representative  model  is  then  found  by  solving  the  mininax  prob- 
lem 

min  max  < Lfc  ; s,  t 6 S>  (5.43) 

s t 1 8 f 

The  uniquenes;  of  the  solution  of  (5.43)  is  suggested  for  further  re- 


search 
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Remarks 

1)  The  procedures  described  in  this  chapter  find,  in  general,  a model 
in  the  model  set,  whose  output  (or  observations)  statistics  are  best 
matched  with  those  of  the  true  system.  However,  for  the  modeling  prob- 
lems considered  above,  the  role  of  the  output  can  be  played  by  any  lin- 
ear function  of  the  state  variables.  Xf,  for  instance,  it  is  desired  to 
emphasize  certain  variables  that  affect  the  system's  performance  more 
than  the  others,  or  that  can  can  be  measured  better  than  the  others, 
then  these  variables  can  be  selected  as  outputs  for  the  model  selection 
procedures  described  above. 

2)  The  problem  of  selecting  a single  model  from  a model  set,  considered 
in  sections  5.4.1  and  5.4.2  can  be  generalized  to  a problem  of  selecting 
a number  of  models  from  the  set.  so  that  the  model  set  is  approximately 
represented  by  a finite  set  of  models.  An  identification  procedure  can 
then  be  employed  "on-line"  to  find  the  model  in  the  finite  set  which  is 
closest  to  the  true  system.  The  selection  of  a finite  model  set  would 
require,  as  a first  step,  the  division  of  the  infinite  parameter  set 
into  a finite  number  of  subsets.  The  way  in  which  the  parameter  space 
should  be  divided  would  depend  on  considerations  of  the  physical  prob- 
lem involved,  but  it  seems  obvious  that  the  division  could  employ  the 
metric  topology  of  the  parameter  space  introduced  in  Chapter  3.  (Just 
as  interval  lengths  are  used  in  say,  to  divide  a rectangle  into 
equal  parts.)  The  selection  of  a representative  model  for  each  subset 


1 
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is  then  performed  as  described  in  sections  5.4.1  and  5.4.2  above.  Fur- 
ther research  of  this  seemingly  promising  approach  to  system  modeling 

and  identification  problems  is  reco— » ended. 


J 


CHAPTER  VI 


NON -STATIONARY  I I NEAR  SYSTEMS 

The  assumption  of  stationarity  made  in  the  previous  chapter  is  now 
removed,  as  we  consider  the  general  case  of  non -stationary , time- 
varying  linear  systems.  He  first  derive  expressions  for  the  information 
in  the  observations,  discriminating  one  model  in  the  model  set  against 
another.  The  information  conditions  for  the  consistency  of  the  esti- 
mates are  interpreted  in  terms  of  the  second-order  statistics  associated 
with  the  different  models  and  computed  by  solving  the  corresponding 
Riccati  equations  (or,  equivalently,  running  Kalman-Bucy  filters) . The 
consistency  result  for  time  varying  systems  is  not,  however,  as  ex- 
plicit as  in  the  stationary  case.  The  convergence  of  the  likelihood 
and  the  a posteriori  probability  ratios  is  investigated.  The  separate 
contributions  of  the  stochastic  and  the  deterministic  parts  of  the  in- 
put to  the  information  and,  consequently,  to  the  convergence  rates 
are  shewn. 


6.1  Models 


Consider  ths  system 


Vl  ' F*,nxn  + G*.nw» 


z = H*  x + v 
n *,n  n v. 


(6.1a) 
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initialized  at  n - n with 

o 


E*x  “ 0 5 E*x  *„  “ 

* n * n n 9 

o o o 


where  (w  ) and  (v  ) are  uncorrelated  and  mutually  uncorrelated  Gaussian 

n r>. 


sequences  with 


E w ■ E V « 0 
n n 


e/w  w * Q.  j fiiv  v T1  « R 
( n n | *,n  (ns)  *,n 


(6.1b) 


Consider  a finite  set  of  families  of  models 


K =l(F-i  n'  H-i  n'  V 2-i  R-i  „>  » 

2 ( 3*n  3,n  3,n  3 3,n  j#n 

j € K « (o,l,...#p)| 


(6.2) 


Let  (z  ) be  an  & dimensional  observation  sequence.  The  conditional 
n 

. . . . n-1 

probability  \iensity  of  z^  given  the  past  observations  Z and  corres- 
ponding to  each  model  is  given  by 

4 

f . (z  iz""1)  - [<2ir)V.  ll  expi-  \ (z  - z.  ^Z"1  (z  - z.  )i 
3 n1  L 3»n  J \ 2 n 3,n  3,n  n ],n) 


j j e k 


(6.3) 


where,  as  before,  z^  n is  the  one-step  prediction  of  z^  given  the  past 

observations  Zn”i,  assuming  that  the  j'th  model  is  the  true  one  (i.e* 

assuming  that  the  observations  are  generated  by  the  j'th  model),  and 

is  the  corresponding  error  covariance  matrix.  Both  z , and  E . 

3,n  j,n  j,n 

are  generated  by  a Kalaan-Bucy  filter  corresponding  to  the  j'th  model. 


6.2  Information,  Convergence  and  Consistency 

The  information  in  a single  observation  z , favoring  the  k'th 

n 

medal  against  the  j'th  model  will  now  be  derived. 


V 


V*,j) 


n-1 


log 


vzJzn~1) 

fj(z„|rt 


U 


c1  {-  i i <*„  - ^,„>T  C <*„  - **,„>} 

* I * 7 ’ £J,„>T  *£,  <*„  - *!,„» 


O 2 

, U 1 

Ek,n  E* 

(z*TUk* 

t n n f 2 x, n 

l?1  * 

-it; 

e:1 

z «5-  — tr  E . ^ ] 

i k,n  k,n 

2 *,n 

k,n 

*,n  2 j,n 

A 

) V 

1 „ T 

E"1 

1 ^ T 1 

2 *,n 

3*n 

• • ■ > W I 

2 *,n  j,n 

u 


a.e. 
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«.  n-1  / y . i - T 

E.  < z z >®  + z * zA 

* \ n n / *9n  ##n  *#n 


i nl  i _i 

7 ^ t^t  4 “ £..»  <£3'„  - O 


X j A A > T r*™X  * A A u 

2 **#n  ” *k,n  ^k,n  * ,n  *k,n 


1 ft  ^ ft  T ftft  — l ft  ^ft  b 

+ r Uft  . - ) 2.  _ (z.  „ - z,  ) a.e. 

2 *,n  3,n  j,n  *,n  j,n 


(6.4) 


,(l)ft,  .ft  _ 1 'Zj»n'  . 1 . ftP-l 


(6.5) 


2 - 7 ‘^.n  - 2k,»'T  C <**.»  - *»,»> 


1 . * A , T 1 ft /ft  /ft  ft 

+ *r  (z*  - z.  ) £.  (z.  - z.  ) 

2 *,n  ;j,n  j,n  *,n  ;j,n 


(6.6) 


I (k;j)  - 1^  (k; j)  + I^2)  (kjj) 
nr  n 


(6.7) 


Suppose  that  condition  (c4.2)  is  satisfied.  Then,  ay  lemma  4.2 
and  by  theorems  4.1  through  4,4  conditions  (c4.3)  or  (c^.4)  are  sufficient 


,;rr.7*r" ! .&$■  — ■ - •-  **mir*--+*uv'**~'* 
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for  convergence  of  the  estimates  to  the  k'th  model  it,  M^.  However,  it 
is  not  difficult  to  see  tn»\t  the  verification  of  conditions  (c4.3)  or 

(c4.4)  is  not  possible  for  the  general  case  considered  here  under  any 

(1) 

conditions  imposed  on  the  ieteroinistic  part  £Q  (kj  j) , do.  to  the  ran- 

12)  (2) 

don  part  1 (k;  j) . In  r action  6.3  we  shall  show  that  I (k;  j)  can  be 

n n 

further  separated  into  deterministic  aud  stochastic  parts.  We  now  show 
that  under  the  assumption  th^t  the  true  model  belongs  to  the  given  set 
M2.  the  .information  expression  for  the  tine  varying  system  under  con- 
sideration is  simplified  and  consequently  mam  explicit  conditions  for 
identification  can  be  obtained. 


Suppose  that  seme  k € K is  the  true  parameter,  then  by  (6.4) 


In(k;j)  - InMkjj)  + I^(k;j) 


(6.8) 


where 


I * 
n 


(k;j) 


i£lfJ . 


I) 


(6.9) 


and 


I"(k»j)  = 4 iz.  - z.  )T  Z~l  (Z.  - z.  ) 

n 2 k.n  j.n  j.n  Tc.n  j,n 


(6.10) 


Consider  the  following  condition 

(c6.1)  For  some  H 6 K and  for  each  j e K ; j y k there  exists  some 
scalar  ca^  > 0 and  a subsequence  (n^)  of  (n)  such  that 

!!*•  4 _ z 1 1 — for  all  (6.11) 

k,nJ  j,nJ  J 


i 
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where 


| |a|  1 = |det  a! 


6.1 


Let  som  k 6 K be  the  true  parameter*  i.e.  let 


(F.  #G.  ,H*  ,R*  > - (F.  ,G.  ,H.  ,¥.,<2.  ,R.  ) 

*,n  *,n  *,n  * **#n  *,n  k,n  k,n  %n  k *k,n  k,n 

Then  condition  (c6.1)  implies  condition  (c4.4)  for  k. 


Proof 


Clearly 


I "(k}  j)  >_  0 for  all  n for  each  j e K 


Thus 


In<kjj)  >_  I^’Ck; j)  for  all  n for  each  j € K. 


It  will  suffice  then  to  show  that  condition  (c6.1)  implies  the  existence 
of  a subsequence  (n3/  of  {n^}  and  some  > 0 such  that 


I* .(k;j)  > s.  far  all  x? 

n3  3 r 


(6.12) 


Consider  the  following  equation 


X L . I o 

n y&si 


(6.13) 
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For  positive  definite  £,  and  £ . there  exists  a nonsingular  matrix  A 

x,n  j fii  n 

such  that  (Anderson,  [1956],  p.341): 


*T  L a - A 
n k,n  n n 


(6.14) 


and 


£j,nAn  “ 1 


(6.15) 


where  A is  a diagonal  matrix  whose  elements  are  X ; i»l,  the 

fl  &|  1 

roots  of  (6.13).  In  addition,  we  have  X . > 1 for  all  i*l,...,£  and 

n,i  “ 

n > 0. 

It  is  easy  to  verify  that  I ' (Jc;  j)  remains  invariant  under  the 
transformations  (6.14)  and  (6.15) . Hence 

V - " J log  |AJ  ♦ \ tr  (A  - I) 


i«l 


(6.16) 


Suppose  that  for  same  subsequence  (n3 ) of  (n) 


• S.Oj11  ~“j  * 0 111 


(6.17) 


Then  there  exists  some  £ ^ > 0 and  a subsequence  (n^)  of  (n3)  such  that 


U ^ “ l|  i C-  for  all  n^  for  each  i«I,...,£  (6.18) 


V1 


since  if  such  and  such  (np  do  not  exist,  then 
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X .,  i , -*-1  as  n ♦» 

n3  n3 


where 


X .,  i . = sin < X , j i ; i»l,...,& 
n3  n3  ' n3 


and  then 


0 


as  n|  + contradicting  (6.17) . Hence, 
How  consider  (6.16).  Since 


(6.17)  implies  (6.18). 


(6.19) 


a - log  a - 1 >_  0 (6.20) 

with  equality  if  and  only  if  a * 1,  and  since  the  function  on  the  left 
hand  side  of  (6.20)  is  convex  in  a,  it  follows  that  given  £ > 0 there 
exists  some  a > 0 such  that 


a - log  a - 1 > a 
whenever  |a  - l|  > £ 

Thus,  finally, (6.18)  implies  that  there  exists  some  > 0 such  that 


I*,  (k; j)  > e..  for  all  n3 
_3  - 3 r 


The  assertion  follows . . 


(6.21) 
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Vfe  have  shown  in  Chapter  4 that  consistency  of  the  parameter 
estimates  (or,  equivalently , identlfiability  of  the  dynamical  system) 
follows  from  conditions  (c4.2)  and  (c4.4) . Condition  (c4.4)  (or,  more 
generally, (c4. 3) ) seems  to  be,  for  obvious  reasons,  the  "crucial" 

condition  for  the  strong  consistency  of  the  estimates.  He  show  below 
that  condition  (c4.2)  holds  for  the  case  of  time  invariant  stationary 
linear  systems.  It  seems,  however,  that  condition  (c4.2)  would  hold 
for  very  general  classes  of  observation  sequences.  For  the  general 
case  of  varying  systems  we  condition  the  consistency  result  on 

condition  (c4.2)  which  has  to  be  checked  for  each  case  under  consid- 
eration. It  seems,  in  particular,  that  condition  (c4.2)  would  not  be 
difficult  to  verify  for  the  class  of  periodically  varying  linear  systems 
and  for  systems  driven  by  bounded  deterministic  inputs.  This,  however, 
is  left  for  future  research. 

Theorem  6.1 

Suppose  that  the  system  (6.1)  belongs  to  the  set  M2  specified  by 
(6.2) . Furthermore,  suppose  that  condition  (c4„2)  holds.  Then  the 
system  is  identifiable  a.e.  by  the  ML  and  the  MAP  estimates  and  identi- 
fiable a.e.  and  in  m.s.  by  the  LS  estimate  on  the  set  if  condition  (c6.1) 
is  satisfied. 

Proof 

The  assertion  follows  from  lemma  6.1  and  theorem  4.6.^ 
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Now  consider  the  case,  treated  in  Chapter  5,  where  the  true  sys- 
tem, given  by  (5.1),  is  assumed  to  belong  to  the  set  given  by  (5.2). 
Under  conditions  (c5.1)  and  (c5.2)  condition  (c6.1)  simplifies  to  the 
the  following  condition: 

(c6.2)  For  each  j e K j j f k 

|e3  - skl  * 0 

Suppose  that  k 6 K is  the  true  parameter,  ’?e  have  for  each  j S K ; 


3 * * 


, V 


- - flog  lljl  -f  tr  t‘X  E.”-1  { Un  - £jn,  <zn  - 
. f log  + f tr  I"1  e/'1  {(zn  - £k<n),zB  - 5^/} 


where,  for  each  j e K 


U 


(6.22) 


*n-1{{z  -z  ){z  - Z )T) 

' n j,n  n D,n  J 

„ n-1  ) T)«  a T ^ /sT 

E*  ) * z >-  z z.  - z.  z. 

' n n J * n j,n  },n  *,n 


+ z . z . 
3»n  D»n 


E*  + (£*  - z.  ) (z.  - z.  ) 

* *,n  3,n'  *,n  3,11' 


(6.23) 


and,  since  k = *, 

J (k;j)  r log  h*  (z  |zn  ^)  - I (k;j) 
n j n n 


1“  £k 


i) 


A 
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(6.24) 


Since  the  sequences  (z  - z . ) and  (z,  - z.  ) are  ergodic  for  all 

n 3,n  k,n  3,n 

j,  k 6 K,  so  are  the  sequences  (1  (k;j))  and  (J  (k;j)).  It  follows 
■ n n 

from  lemma  4.2  that  condition  (c4.2)  is  satisifed.  Condition  (c4.3)  is 
satisfied  if  condition  (c5.4)  is  satisfied,  by  theorem  3.1  and  the 
ergodicty  of  (In(k;j)).  The  identifiability  of  the  system  under  con- 
dition (c5.4)  thus  follows  from  theorem  4.5. 


6.3  Convergence 

We  have  shown  in  section  4.4  that  by  bounding  the  information  in  the 
observations  away  from  zero,  bounds  on  the  convergence  rates  of  the 
likelihood  and  the  a posteriori.  probability  ratios  can  be  established, 
which  in  turn  provides  performance  measures  for  the  ML  and  the  MAP  es- 
timation procedures.  In  this  section  we  consider  the  identification  of 
a general  class  of  time-varying  systems  driven  by  stochastic  and  deter- 
ministic inputs.  The  fact  that  only  convergence  in  and  not  in  the 
stronger  senses  of  a.e.  and  m.s.  is  sought  enables  us  to  obtain  rather 
explicit  results.  The  stochastic  and  the  deterministic  parts  of  the 
input  are  shown  to  contribute  separately  to  the  convergence  rates  of  the 
identification  procedures. 
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Consider  the  system 


x ■ F.  x + G*  u + J*  w 
n+1  *,n  n *,n  n *,n  n 


2 « H*  x + v 
n *,n  n rx 


(6.25) 


where  (u^)  is  a deterministic  (known)  input  sequence  and  the  other  ele- 
ments are  as  specified  in  section  6.1.  Also  consider  a model  set 

M.  = |(F  ,G  ,J  ,H  .¥.,2.  »R.  ) ; j e K)  1 (6.26) 

3 l 3,n  3,n  ],n  3,n  3 3,n  3,n  ) 

where  Q.  and  R.  are  the  covariance  matrices  of  (u  ) and  (v-)  respec- 
ts* 3*n  n n 

tively,  corresponding  to  each  model. 

The  incremental  information  for  favoring  a parameter  k over  a 
parameter  j in  the  set  K,  is  given  by  (6.8) . For  each  j e K we  have 


I"  (*;j)  =T(z*  - z.  )T  ST1  (£  - £.  ) 

n 2 *,n  3,n  3,n  *,n  j,n 


(6.27) 


where 


(£*  - 2.  ) 
*,n  3,n 


* 

H.  x. 

3»n  J*n 


(6.28) 


= (H*  , H.  ) 

*n  3 »n 


(6.29) 


C - T ^ T . T 
x.  „ = (x.  , x.  ) 

* »n  3,n 


(6.30) 
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For  each  j e K'  we  have 


i.  . , ■ F.  (I  - K.  H.  ) fi.  + G . u 
j#n+l  3,n  jfn  3,n  j,n  j,n  n 


+ FKHx+  FKv  (6.31) 

jin  j,n  **n  n j#n  j#n  n 


where 


K.  = 2,  H T (H.  £.  H.T  + R.  f1 

3,n  j*n  3,n  3,n  3,n  3,n  3#n 


JC.  = nj c.  . , » F.  (I  - K.  H.  )£.  + G . U 

j,n+l  * 3 ,n+l  3,n  3#n  3»n  3,n  3,n  n 


+ F.  K.  H#  xA 
j,n  3,n  *,n  *,n 


where 


X*  _ = E .X  * 


U , 

_ _ n-1 

E*E*  xn 


Also  let: 


Xj,n+1 


Si.  , _ - x.  . « F.  (I  - K.  H.  )X. 

3,n+l  3*n+l  3,n  3,n  ?,n  3*n 


+ F.  K.  H.  (x  - x*  ) 
3,n  3,n  *,n  n *,n 


+ F.  K.  v 
3,n  3 ,n  n 


, = x - x.  . . = F.  x . + J.  w 

',n+l  n+1  *,n+l  *,n  *,n  *,n  r 
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* 


* 


& 


t’.ow  let 


where 


*j »n  5 Hj,n  Xj,n 


> ,/v  T c T T 

xj,n  = X**n  ' Xj *n 


(6.32) 


(6.33) 


and  let 


where 


T H H*  X* 

j,n  3 »n  j,n 


= /?  T 5 T )T 

j,n  " lx*,n'  Xj f« 


(6.34) 


(6.35) 


Then  we  cam  write 


X"  (*; j) 
n 


1 (a*  w*  .T  <p“l  /"*  * \ 

2 <Zj,n  + Zj,n}  »n(Zj  #n  + Zj ,n 

1 >T  r-i  > . i e-i  t + i*T  IT1  3* 

2 Zj»n  j,n  Zj,n  2 j,n  j,n  j,n  j ,n  j,n  j#n 


Let 


V 

I 

¥ 


and 


- - T 

* _ A*  A* 

= z,  z. 

3#n  j,n  3,n 


* _ ■** 

V.  = z . z. 

3,n  D#n  3,n 


* _ /v* 

G.  = z . z . 

3«n  J,n  3,n 


(6.36) 


(6.37) 


(6.38) 


ikrii  ii  «w  iai  / 4 i 
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Then  we  have 


s i*!j>  ■ * **  c 


(6.39) 


We  shall  use 


= £„$* 

j#n  * j,n 

which  is  obtained  via  the  following  procedure. 


Define 


x.  = (x.  , x.  , x.  ) 

3,n  *#n  *,n  ;j,n 


(6.40) 


«.* 

Then  x.  is  generated  by  the  following  equation 
3 #n 


* 

X.  « F.  x.  + G . w 
j,n+l  3,n  j,n  j n 


(6.41) 


where 


3»n 


F . (I  - K*  H*  ) 
* n * n * . n 


F*  K*  H*  _ _ _ . 

*,n  *,n  *,n  w,n  w,n  w,n 


F.  K.  H. 
3»n  },n  *,n 


F.  (I  - K,  H ) 
3,n  j,n  h,n 


(6.42) 


* 

G. 

3»n 


0 F.  K. 

* »n  *,n 


0 F.  K. 

3»n  j ,n 


W = 
n 


w 


n 


v 


n 


(6.43) 
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Also  let 


Then 


i A a ni 

*3,n  ■ ««•  «•  *,„>* 


where 


,*  _ ( -*  ■ ~*T  \ 


is  generated  fay  the  equation 


n* 


-4  - F.  n.  F,  + <3 . 0 Gj 

3,n+l  3,n  j,n  j,n  j,n  j,n 

initialized  at 


.T 


n. 


0 

0 


0 0 » 


Next  consider  V.  . we  have 

'■  j 


(6.44) 


(6.45) 


(6.46) 


(6.47) 
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Note  that  while  2.  and  I.  (or  I ' (k;j))  depend  only  on  the  sto- 
j ,n  k,n  n 

chaotic  part  of  the  input  the  term  represents  its1  deterministic 

3#n 

part.  In  the  sequel  we  examine  the  separate  contributions  of  these 
elements  to  the  convergence  rates  of  the  likelihood  ratios  and  the 
a posteriori  probability  ratios  on  the  set  K. 


Theorem  6 . 2 

Suppose  that  the  true  system  (6.25)  belongs  to  the  set  given 
by  (6.26) . Let  k 6 K be  the  true  parameter.  Suppose  that  for  each 
j 6 K ; j f k there  exist  a positive  scalar  a.  and  a positive _J.nteg§r 
N_.  such  that 


IK.n-Zj.JI  *»all  n>N3  (6.52) 


Then  we  have  for  each  j e K ; j ? k 

(n-N  +l)ct. 

E*h.  (2  ) > e 3 3 for  all  n > N. 

3 ~ 3 


and 


fh(ki Zn) 

_ fh  (k) 

fb(j|2n> 

f**  (j) 

e 


(n-N.+l)a. 
3 3 


for  all  n > N. 

~ 3 


Proof 

The  proof  follows  from  arguments  similar  to  those  made  in  the  proof 


^ Ji  ■ II  I 
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of  1r~rr  6.1.  We  first  show  that  (6.52)  ixqplies  that  there  exists 
sose  e > 0 such  that 

|X  . - 1|  > e for  each  i*l,...,£,  for  all  n > N . (6.53) 

> n,i  ' — — 3 

where  X . ; ial,...«£  are  the  solutions  of  (6.13).  Shppose  that.  (6.53) 
n,i 

d“‘  not  h0“'  '■*“  *ny  e > o «ut.  •<«  »e  > -,  that 

|X  - l|  < C (6.54) 

e*  n 

e 

where 


min 


V1 


; i«l,...,£ 


and  then,  by  continuity  of  the  left  hand  side  of  (6.13)  in  Xn,  given 

o.  > 0 one  can  take  e such  that 
3 


'Vi,  *1#J  * <0J 

t n_ 


yielding 


- I. 
3»n‘ 


< a. 

3 


contradicting  (6.52) . Hence  (6.52)  implies  (6.53).  Now  by  (6.16)  and 
by  the  convexity  of  (6.20)  we  have  that  (6.53)  implies  that  for  each 
j € K ; j ? k there  exists  some  > 0 such  that 


■ • 1 
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I ‘ (k;  j)  > a.  for  all  a > N 
n ~ 


Since  I"  (k;j)  > 0 we  then  have 
n “■ 


In<k;  j)  j>  for  all  n >S, 

•'V'-v;',  . 

Condition  (c4.S)  is  then  satisfied  and  the  assertion  follows  from  egua- 

\ , , 

ticms  (4.11)  and  (4.12)  in  the  proof  of  theorem  4.11. M 


Corollary  6.1  , 

Let  the  set  W2  he  iime  invariant  and  let  the  true  system  belong 
to  A^.  Suppose  that  for  each  j e K 2.  given  by  (6.23)  is  finite  and 
non-singular.  Then  the  L» -convergence  bounds  asserted  in  theorem 


(6.2)  holds  under  condition  (c6. 2),  where  k is  the  true  parameter. 

' ' \ 1 . 

. i ■■ 

V,V ; V 


Proof 


Condition  (c6.2)  imp)  .es  that  for  each  j 6 K ; j / k there  exists 


some  C-  > 0 such  that 
3 


IS  -skN 


clearly. 


|I.  - Ij  i > c. 

'3  k*  1 — 3 
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Hence*  for  any  positive  scalar  such  that  0 < there  exists 

mum  positive  integer  such  that 

n ’ Zk  nil  - ai  for  all  n ^ N, 

3,n  k#n  — 3 — 3 

The  assertion  than  follows  fron  theorem  6.2.^ 

Theorem  6. 3 

Suppose  that  the  true  system  (6.25)  belongs  to  the  set  given  by 
(6.26) . Let  k € K be  the  true  parameter.  Suppose  that  for  each  j € K 
j ft  k there  exists  a positive  scalar  or  and  a positive  integer  N_.  such 
that 

tr  I”1  > 2a.  fox  all  n > N (6.55) 

3«n  j,n  — 3 — 3 

Then  the  convergence  rates  asserted  in  theorem  6.2  hold. 

Proof 

For  each  j e K ; j f k we  have 

I * (k; j)  > 0 for  all  n > 0 
n — — 

(It  follows  from  (3.5).  Also  see  the  proof  of  lemma  6.1)  and 

tx  (t1*  >0  for  all  n > 0 

3 «n  },n  - - 
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I (k;j)  > \ tr  IT1  V*  n > a for  all  n > N 
n — 2 j*n  3,  n 3 3 

Condition  (c4.5)  is  then  satisfied  and  the  assertion  follows  from  equa- 
tions (4.11)  and  (4.12)  in  the  proof  of  theorem  4.11._ 

Theorem  6.2  guarantees  a certain  convergence  rate  of  the  likeli- 
hood ratios  and  the  a posterion  probability  ratios  under  a certain  con- 
dition involving  the  stochastic  characteristics  of  the  inputs  to  the 
systems.  Theorem  6. 3 means  that  the  convergence  rates  can  be  improved 
by  application  of  certain  deterministic  inputs,  satisfying  (6.55). 


CHAPTER  VII 


SUGGESTIONS  FOR  FURTHER  RESEARCH 


7.1  Extension  to  Contact  Parameter  Sets 

As  mentioned  in  Chapter  1,  the  extension  of  parameter  estimation 
convergence  results  from  finite  to  infinite  sets  can,  in  general,  be 
obtained  via  the  addition  of  topological  conditions  on  the  parameter 
set.  Let  S be  a.  compact  metric  space  with  metric  6.  In  the  previous 
sections  we  have  studied  conditions  under  which  one  has  for  some  r 6 S. 

(c7.1)  lim  hS  (Zn)  * 0 a.e.  for  each  s e S;  s f r (7.1) 

n_M>  r 


We  have  seen  in  Chapter  4 that  if  the  true  parameter  is  a member  of  the 
parameter  set,  say,  * * r 6 S,  then  (c7.1)  is  implied  by  the  following 
conditions 

n 

(cl.?,)  lim  I (r;s)  * 00  a.e.  for  each  s e S;  s / r 
m 


and 

n 

lim  sup  ^^Jm(r?s)  > a.e.  for  each  s e S;  s ? r 

m=l 

The  pointwise  convergence  in  (7.1)  is  not  sufficient  for  convergence 
a.e.  of,  say,  the  ML  estimates  on  S to  r (although  mistakenly  consid- 
ered to  be  by  several  authors) . 
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To  obtain  convergence  a.e.  of  the  estimates  to  r it  must  be  shown  that 
for  any  open  neighborhood  V(r)  of  r one  has 

lim  sup  hs  <Zn)  = 0 a.e.  (7.2) 

iv»“  s 6 Vc(r)  r 

where  VC(r)  is  the  complement  of  V(r)  in  5.  Consider  the  following 
condition. 

(c7.3)  At  each  s e S the  ratios  h®  (Zn)  are  continuous  in  s uniformly 
in  n.  This  means  that  for  any  realization  of  the  sequence  (z^)  given 
e > 0 there  exists  for  each  s 6 5 a neighborhood 

V(s)  * | t : j t - s|  < <5g  | (7.3) 

for  some  6 > 0,such  that 

s 

sup  jhfc  (Zn)  - hS  (Zn) | < £ for  all  n > 0 (7.4) 

t e V (s)  r r 

Theorem  7,1 

Suppose  that  conditions  (c7.1)  and  (c7.3)  hold,  then  ML  estimates 
on  S converge  a.e.  to  the  parameter  r. 

Proof 

c . 

Choose  £ < 1.  Then  for  each  s e V (r)  there  exists  an  open  neigh- 
borhood V(s)  satisfying  (7.3)  and  (7.4).  Since  V(r)  is  open,  VC(r)  is 
a closed  subset  of  a compact  set,  hence,  compact.  Thus,  there  exists  a 


finite  number  of  points  s^/  i 6 I * such  that 


Now 


Vc(r)  C Ui|v(si)  ; i e l| 


lim  sup  h8  (Zn)  < lim  max 

n-*»  s 6 VC(r)  r “ n**00  i 


i sup  h^  (Zn 

't  6 V(s.) 


) } i e I 


< lim  max  | h {Zn 
n •*»  i ' si 


) + e ; i e i 


} 


max  | lim  Fhs  (Zn)  + el  ; i e 
i ^ n-*00  *-  si  ■* 


a e < 1 


a.e. 


But  since 


lim  sup  hs  (Zn)  > lim  hr  (Zn) 

n-*»  s e 5 r n-w  r 


the  ML  estimates  on  S converge  a.e.  to  r. 


The  proof  of  convergence  a.e.  of  MAP  estimates  on  S to  r is  simi 
lar,  as  by  (2.11)  we  have 


fk  (s  j Z11)  _ fh  (s ) 
fk  (r|  zn)  fh  (r) 


h8  (Zn) 


Condition  (c'/.3)  and  its  applicability  to,  cases  of  interest  are 


suggested  for  further  research.  Two  guiding  questions  seem  to  be: 


.-.-isjt  tjwr- 


of  ingpulse  responses  of  the  system's  innovations  representations  for 
the  identif lability  of  stationary  linear  systems.  Similar  conditions 


j were  suggested  by  Tse  and  Weinert  [1975]  and,  for  the  finite  parameter 

set  case,  by  Moore  and  Hawkes  [1974] . The  advantages  of  statistical 

> 

* 4 uniqueness  conditions  such  as  the  one  suggested  by  Baram  and  Sandell 

■> 

i 

[1976]  and  in  this  thesis  is  that  they  apply  to  any  given  set  of  state 
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space  models  and  not  to  certain  canonical  representations  of  the  system, 
and  they  are  verifiable  by  standard  computations  (such  as  the  steady- 
state  solutions  of  Riccati  and  Lyapunov  equations) . Their  disadvantage 
is  that  the  actual  parametrization  of  the  system  gets  lost  in  the  statis- 
tical conditions.  The  homeomorphism  condition  presented  above  seems  to 
correspond  to  conditions  (c7.2),  which  requires  uniqueness,  and  (c7.3) 
which  requires  continuity,  put  together.  More  elaborate  investigation 
of  the  correspondence  between  these  conditions  is  suggested  for  future 
research.  The  finite  parameter  set  case  should  be  addressed  first. 

7.3  Identifiability  by  Deterministic  Inputs 

The  application  of  deterministic  inputs  to  dynamic  systems  for  the 
purpose  of  identification  and  their  optimal  selection  have  been  addressed 
by  several  authors  (Levadi  [1966],  Gagliardi  [1967],  Nahi  and  Wallis 
[1969],  Aoki  and  Staley  [1970],  Mehra  [1972],  Goodwin,  et  al  [1973], 
Lopez-Toledo  and  Athans  [1975]).  The  analysis  of  section  6.3  suggests 
a new  approach  to  the  problem.  It  follows  from  theorem  6.3  that  any 
input  sequence  that  satisfies  (6.3)  will  provide  convergence  in  the 
mean  of  the  identification  procedures  at  a certain  rate.  The  condition 
in  (6.3)  also  involves  the  system's  coefficients  and  thus,  the  selected 
deterministic  input  sequence  will  obviously  depend  on  the  nature  of  the 
system  under  consideration.  The  problem  can  then  be  presented  as  follows. 
Under  what  conditions  on  the  true  system  generating  the  observations 
and  on  the  model  set  will  the  identification  procedures  converge  to  a 
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model  in  the  set  using  some  input  sequence,  and  what  class  of  input 
sequences  will  then  provide  identifiability? 

7.4  Other  Application  Areas 

In  Chapters  5 and  6 we  have  applied  th^  general  theory  derived  in 
Chapters  3 and  4 to  certain  aspects  of  linear  system  identification  and 
modeling.  Further  investigation  of  modeling  aspects  has  been  suggested 
in  remarks  1 and  2 in  section  5.4.  Other  general  areas  of  application 
which  have  not  been  specifically  addressed  in  this  thesis  are: 

1)  Application  to  certain  classes  of  time  varying  systems, 
such  as  periodically  varying  linear  systems. 

2)  Application  to  non-linear  system  identification 
problems . 

3)  Application  to  signal  detection  problems  in  communication 


systems 


TTW1 


5 u.  i w ..  i ju  mm 


'X5rr?*T?T  r* 
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